NanoCode012
9fc29e082b
chore(doc): Add info on changing role in sharegpt ( #886 )
2023-11-22 15:32:50 +09:00
Mark Saroufim
ddf815022a
Install from git url ( #874 )
...
* Install from git url
* Update README.md
2023-11-17 12:50:51 -05:00
Wing Lian
0de1457189
try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch ( #867 )
...
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
2023-11-16 10:42:36 -05:00
NanoCode012
3cc67d2cdd
Feat: Add dataset loading from S3, GCS ( #765 )
...
* Feat: Add dataset loading from S3, GCS
* chore: update docs
* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb
allow overriding of model_config parameters from the YML ( #853 )
...
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
Wing Lian
8a8d1c4023
make docker command more robust ( #861 )
...
* make docker command more robust
* update readme with more info
2023-11-15 23:03:54 -05:00
Wing Lian
332984db18
lint fix that didn't get caught by linter ( #866 )
2023-11-15 14:36:40 -05:00
Zongheng Yang
b33c1d55a2
Docs: add instructions to 1-click launching on public clouds ( #862 )
...
* Update README.md
* Update ToC
2023-11-15 14:11:27 -05:00
NanoCode012
501b4d1379
chore(doc): Separate section on runpod ( #860 )
2023-11-16 01:06:51 +09:00
NanoCode012
306fe19c54
feat(doc): add more info on train_on_split ( #855 )
2023-11-15 23:42:26 +09:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
NanoCode012
11c48c5e03
fix(doc): Add note on inference w sample packing ( #712 )
2023-10-10 21:08:17 +09:00
seungduk.kim.2304
77c84e02fd
Update README with some explanations ( #700 )
...
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
2023-10-08 13:37:54 -04:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00
Napuh
85b0be2ba7
Warn users to login to HuggingFace ( #645 )
...
* added warning if user is not logged in HF
* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Wing Lian
895f0a0723
skip some flash attn patches unless explicitly enabled ( #643 )
...
* skip some flash attn patches if explicitly disabled
* make the other patches optional
2023-09-27 12:11:07 -04:00
Wing Lian
e7d3e2dbb6
use fastchat conversations template ( #578 )
...
* use fastchat conversations template
* require fastchat (fschat) pip install
* handle roles dynamically from conversation
* tweak fastchat conversation with a monkeypatch to get individual turns
* fix up so it works with multiple conversation styles, and don't strip the turns
* fix sharegpt fixture now that we're using a more correct tokenization
* use a new prompter and support fastchat conversation type
* use sharegpt from prompt strategies now
* update docs, add chatml template
* add a newline after im_end token
* ensure we correctly set system message
* update per PR feedback to handle deprecated sharegpt types
* don't add duplicate wandb req
* make sharegpt fields configurable from yml
* llama2 fixes
* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
NanoCode012
19a600a8b8
Feat: Add support for upstream FA2 ( #626 )
...
* Feat: Add support for upstream FA2
* chore: add is_falcon_derived_model: true to examples
* chore: add config to readme for documentation
* feat: add extra model types
* fix: remove old falcon flash patch
* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Fernando Tarin Morales
5e5296a77c
Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh ( #632 )
2023-09-25 11:50:14 -04:00
NanoCode012
67b9888630
Feat(doc): Add eval_sample_packing to doc ( #625 )
2023-09-23 13:11:27 +09:00
Wing Lian
c25ba7939b
update README w deepspeed info ( #605 )
2023-09-22 00:15:52 -04:00
NanoCode012
00dce35fb2
Feat(data): Allow loading local csv and text ( #594 )
...
* Feat(data): Allow loading local csv and text
* chore: update readme for loading data
2023-09-17 11:32:27 -04:00
NanoCode012
3a2edc85c3
Feat(doc): Add features to doc ( #583 )
2023-09-16 01:14:15 +09:00
Wing Lian
f7a22632d7
support custom field for completion from yml ( #580 )
...
* support custom field for completion from yml
* remove legacy completion check and add doc
* update README docs
2023-09-15 07:48:21 -04:00
Wing Lian
a5a625f47e
update support matrix with btlm and phi ( #579 )
2023-09-15 02:46:15 -04:00
Wing Lian
861cecac2a
refactor scripts/finetune.py into new cli modules ( #550 )
...
* refactor scripts/finetune.py into new cli modules
* continue to support scripts/finetune.py
* update readme with updated cli commands
* Update scripts/finetune.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-09-15 01:43:52 -04:00
Wing Lian
a4e1bb6606
let hf trainer handle torch compile ( #516 )
...
* let hf trainer handle torch compile
* remove torch compile checks, include option for backend
* suppress torch errors to get further
* require min torch version of 2.1.0 for torch compile to work
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-09-13 11:42:12 -04:00
Glavin Wiechert
5b67ea98a6
Add training callback to send predictions to WandB table ( #521 )
...
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Wing Lian
9845c5e12d
document that packaging needs to be installed before flash-attn ( #559 )
2023-09-12 12:18:30 -04:00
The Objective Dad
6d57f2f0f0
ergonomic update to optimizer config doc ( #548 )
2023-09-11 12:35:45 -04:00
Wing Lian
34c0a86a11
update readme to point to direct link to runpod template, cleanup install instrucitons ( #532 )
...
* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
2023-09-08 11:58:54 -04:00
The Objective Dad
5e2d8a42d9
Adding NCCL Timeout Guide ( #536 )
...
* fixes NCCL_P2P_LEVEL=NVL #429
* adding more insights into verious values of NCCL_P2P_LEVEL
2023-09-08 11:57:47 -04:00
NanoCode012
f51c9c56c6
Fix(doc): Inform Windows users to use WSL/docker ( #518 )
2023-09-01 00:08:21 -07:00
Jan Philipp Harries
396a7a74fc
Added advanced DDP args ( #515 )
...
* add ddp_config
* add advanced ddp config
* add ddp_config
* add advanced ddp config
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-08-31 10:37:47 -07:00
Wing Lian
5ac3392075
support for datasets with multiple names ( #480 )
...
* support for datasets with multiple names
* update docs
2023-08-29 06:18:17 -07:00