Wing Lian
4d2e842e46
use recommended setting for use_reentrant w gradient checkpointing ( #1021 )
...
* use recommended setting for use_reentrant w gradient checkpointing
* add doc for gradient_checkpointing_kwargs
2024-01-01 22:17:27 -05:00
mhenrichsen
f8ae59b0a8
Adds chat templates ( #1022 )
2023-12-29 15:44:23 -06:00
NanoCode012
41353d2ea0
feat: expose bnb kwargs ( #1018 )
...
* feat: expose bnb kwargs
* chore: added examples and link per suggestion
* Uncomment defaults per suggestion for readability
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
---------
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4
feat: remove need to add load_in* during merge ( #1017 )
2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53
[Docs] Nit: Remind people to auth to wandb if they are going to use it ( #1013 )
2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da
Update README.md ( #1012 )
2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4
remove landmark attn and xpos rope implementations ( #1010 )
2023-12-27 21:07:27 -08:00
Ikko Eltociear Ashimine
d25c34caa6
Update README.md ( #966 )
2023-12-17 09:51:25 -05:00
Hamel Husain
712fd27b3f
Add docs ( #947 )
...
* move section
* update README
* update README
* update README
* update README
* update README
* Update README.md
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-13 14:22:52 -08:00
kallewoof
ef24342538
fix: switch to using the HuggingFace Transformers NEFT implementation ( #941 )
...
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
2023-12-13 17:15:34 -05:00
Juraj Bednar
b0cf397ecb
More hints on what to do with CUDA Out of memory errors ( #925 )
2023-12-13 16:38:38 +09:00
Wing Lian
5f79b8242f
new evals_per_epoch and saves_per_epoch to make things cleaner ( #944 )
...
* new evals_per_epoch and saves_per_epoch to make things cleaner
* update per PR feedback
2023-12-12 15:35:23 -05:00
Wing Lian
68b227a7d8
Mixtral multipack ( #928 )
...
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
2023-12-09 21:26:30 -05:00
NanoCode012
d339beb9d9
chore: clarify Readme on sharegpt system role
2023-12-08 11:35:53 +09:00
Bryan Thornbury
992e742cdc
Support device_map=sequential & max_memory config parameters ( #903 )
...
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-04 09:29:21 -05:00
NanoCode012
a1da39cd48
Feat(wandb): Refactor to be more flexible ( #767 )
...
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
2023-12-04 22:17:25 +09:00
kallewoof
58ec8b1113
feature: loss watchdog for terminating training runs that are failing ( #899 )
...
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
2023-12-04 07:54:34 -05:00
NanoCode012
1115c501b8
Feat: Add Qwen ( #894 )
...
* Feat: Add Qwen
* feat: add qwen lora example
* feat: update matrix
* fix: add trust_remote_code
* fix: disable gradient checkpointing
* chore: add warning about gradient checkpointing
* fix: config
* fix: turn off sample packing for this example and reduce seq len
* chore: add comment on seq len
2023-11-26 00:05:01 +09:00
NanoCode012
fb12895a17
Feat: Add warmup_ratio ( #893 )
...
* Feat: Add warmup_ratio
* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
9fc29e082b
chore(doc): Add info on changing role in sharegpt ( #886 )
2023-11-22 15:32:50 +09:00
Mark Saroufim
ddf815022a
Install from git url ( #874 )
...
* Install from git url
* Update README.md
2023-11-17 12:50:51 -05:00
Wing Lian
0de1457189
try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch ( #867 )
...
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
2023-11-16 10:42:36 -05:00
NanoCode012
3cc67d2cdd
Feat: Add dataset loading from S3, GCS ( #765 )
...
* Feat: Add dataset loading from S3, GCS
* chore: update docs
* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb
allow overriding of model_config parameters from the YML ( #853 )
...
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
Wing Lian
8a8d1c4023
make docker command more robust ( #861 )
...
* make docker command more robust
* update readme with more info
2023-11-15 23:03:54 -05:00
Wing Lian
332984db18
lint fix that didn't get caught by linter ( #866 )
2023-11-15 14:36:40 -05:00
Zongheng Yang
b33c1d55a2
Docs: add instructions to 1-click launching on public clouds ( #862 )
...
* Update README.md
* Update ToC
2023-11-15 14:11:27 -05:00
NanoCode012
501b4d1379
chore(doc): Separate section on runpod ( #860 )
2023-11-16 01:06:51 +09:00
NanoCode012
306fe19c54
feat(doc): add more info on train_on_split ( #855 )
2023-11-15 23:42:26 +09:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
NanoCode012
11c48c5e03
fix(doc): Add note on inference w sample packing ( #712 )
2023-10-10 21:08:17 +09:00
seungduk.kim.2304
77c84e02fd
Update README with some explanations ( #700 )
...
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
2023-10-08 13:37:54 -04:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00
Napuh
85b0be2ba7
Warn users to login to HuggingFace ( #645 )
...
* added warning if user is not logged in HF
* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Wing Lian
895f0a0723
skip some flash attn patches unless explicitly enabled ( #643 )
...
* skip some flash attn patches if explicitly disabled
* make the other patches optional
2023-09-27 12:11:07 -04:00