Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
NanoCode012
11c48c5e03
fix(doc): Add note on inference w sample packing ( #712 )
2023-10-10 21:08:17 +09:00
seungduk.kim.2304
77c84e02fd
Update README with some explanations ( #700 )
...
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
2023-10-08 13:37:54 -04:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00
Napuh
85b0be2ba7
Warn users to login to HuggingFace ( #645 )
...
* added warning if user is not logged in HF
* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Wing Lian
895f0a0723
skip some flash attn patches unless explicitly enabled ( #643 )
...
* skip some flash attn patches if explicitly disabled
* make the other patches optional
2023-09-27 12:11:07 -04:00
Wing Lian
e7d3e2dbb6
use fastchat conversations template ( #578 )
...
* use fastchat conversations template
* require fastchat (fschat) pip install
* handle roles dynamically from conversation
* tweak fastchat conversation with a monkeypatch to get individual turns
* fix up so it works with multiple conversation styles, and don't strip the turns
* fix sharegpt fixture now that we're using a more correct tokenization
* use a new prompter and support fastchat conversation type
* use sharegpt from prompt strategies now
* update docs, add chatml template
* add a newline after im_end token
* ensure we correctly set system message
* update per PR feedback to handle deprecated sharegpt types
* don't add duplicate wandb req
* make sharegpt fields configurable from yml
* llama2 fixes
* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
NanoCode012
19a600a8b8
Feat: Add support for upstream FA2 ( #626 )
...
* Feat: Add support for upstream FA2
* chore: add is_falcon_derived_model: true to examples
* chore: add config to readme for documentation
* feat: add extra model types
* fix: remove old falcon flash patch
* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Fernando Tarin Morales
5e5296a77c
Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh ( #632 )
2023-09-25 11:50:14 -04:00
NanoCode012
67b9888630
Feat(doc): Add eval_sample_packing to doc ( #625 )
2023-09-23 13:11:27 +09:00
Wing Lian
c25ba7939b
update README w deepspeed info ( #605 )
2023-09-22 00:15:52 -04:00
NanoCode012
00dce35fb2
Feat(data): Allow loading local csv and text ( #594 )
...
* Feat(data): Allow loading local csv and text
* chore: update readme for loading data
2023-09-17 11:32:27 -04:00
NanoCode012
3a2edc85c3
Feat(doc): Add features to doc ( #583 )
2023-09-16 01:14:15 +09:00
Wing Lian
f7a22632d7
support custom field for completion from yml ( #580 )
...
* support custom field for completion from yml
* remove legacy completion check and add doc
* update README docs
2023-09-15 07:48:21 -04:00
Wing Lian
a5a625f47e
update support matrix with btlm and phi ( #579 )
2023-09-15 02:46:15 -04:00
Wing Lian
861cecac2a
refactor scripts/finetune.py into new cli modules ( #550 )
...
* refactor scripts/finetune.py into new cli modules
* continue to support scripts/finetune.py
* update readme with updated cli commands
* Update scripts/finetune.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-09-15 01:43:52 -04:00
Wing Lian
a4e1bb6606
let hf trainer handle torch compile ( #516 )
...
* let hf trainer handle torch compile
* remove torch compile checks, include option for backend
* suppress torch errors to get further
* require min torch version of 2.1.0 for torch compile to work
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-09-13 11:42:12 -04:00
Glavin Wiechert
5b67ea98a6
Add training callback to send predictions to WandB table ( #521 )
...
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Wing Lian
9845c5e12d
document that packaging needs to be installed before flash-attn ( #559 )
2023-09-12 12:18:30 -04:00
The Objective Dad
6d57f2f0f0
ergonomic update to optimizer config doc ( #548 )
2023-09-11 12:35:45 -04:00
Wing Lian
34c0a86a11
update readme to point to direct link to runpod template, cleanup install instrucitons ( #532 )
...
* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
2023-09-08 11:58:54 -04:00
The Objective Dad
5e2d8a42d9
Adding NCCL Timeout Guide ( #536 )
...
* fixes NCCL_P2P_LEVEL=NVL #429
* adding more insights into verious values of NCCL_P2P_LEVEL
2023-09-08 11:57:47 -04:00
NanoCode012
f51c9c56c6
Fix(doc): Inform Windows users to use WSL/docker ( #518 )
2023-09-01 00:08:21 -07:00
Jan Philipp Harries
396a7a74fc
Added advanced DDP args ( #515 )
...
* add ddp_config
* add advanced ddp config
* add ddp_config
* add advanced ddp config
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-08-31 10:37:47 -07:00
Wing Lian
5ac3392075
support for datasets with multiple names ( #480 )
...
* support for datasets with multiple names
* update docs
2023-08-29 06:18:17 -07:00
NanoCode012
48c56470d0
Fix(doc): Clarify no amp to full yaml docs ( #496 )
2023-08-29 06:17:37 -07:00
Birch-san
8e197f6fb4
pad_to_worst_case_seq_len boolean, for testing memory limits ( #498 )
...
* pad_to_worst_case_seq_len boolean, for testing memory limits
* remove collator_pad_to_longest option since it does nothing
see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding
True and "longest" mean the same thing
* rename to `pad_to_sequence_len, and ensure 64 alignment
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-08-28 18:47:16 -04:00
NanoCode012
ad8be435ad
Feat(doc): Update eval_steps doc ( #487 )
2023-08-27 10:09:09 +09:00
Charles O. Goddard
bde3c5a478
ReLoRA implementation (with quantization) ( #322 )
...
* Experimental ReLoRA (+qlora) implementation
* Add CPU offload
* Remove local config
* Fix saving logic
* Remove redundant assert
* Fix logic errors
* Move ReLoRA into its own trainer class with a method override to create the proper scheduler
* Formatting & typing fixes
* Use safe_serialization
* Don't allow fsdp/deepspeed with ReLoRA
* Fix cpu-offload logic, enable multi gpu
* Document parameters and add comment
* Fix merge issue
* Smooth over some sharp edges
* Implement resume from checkpoint for relora
* Address review comments
* Fix saving logic
* Add necessary metadata to safetensors
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-23 23:07:18 -04:00
NanoCode012
55c23c7bcb
Fix(doc): Clarify config ( #466 )
2023-08-23 11:56:01 -04:00
TearGosling
f4746507f6
feat: add Metharme prompt strategy ( #446 )
...
* Add Metharme tokenizing strategy
This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.
* Redo Metharme tokenizing strategy
lol
* fix: oops
* Rearrange a conditional
* chore: reformat code in accordance with linter
* chore: Make lint not freak out
* chore: fix lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-22 11:21:45 +09:00
NanoCode012
04a42b6db1
feat(docs): improve user customized prompts ( #443 )
...
* feat(docs): improve user customized prompts
* feat(doc): add custom pretokenized instructions
* chore: clean old data folder
* chore: add new line
2023-08-20 23:59:43 -04:00
NanoCode012
919f4cac90
feat(doc): add pillow to lambda instructions ( #445 )
2023-08-20 23:59:23 -04:00
Wing Lian
d2e7f27240
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files ( #348 )
...
* support user defined prompters, pretokenized datasets in config, local parquet, local arrow files
* fix user defined dataset types
* fix for system prompts
* fix tests
* fix checks for parquet and arrow
* aha moment that d.data_files isn't used
* add documentation for ds_type to add support for parquet and arrow
2023-08-20 09:17:49 -04:00
Philpax
d21318dfb9
docs(readme): add cd axolotl ( #440 )
2023-08-19 19:14:05 -04:00
Wing Lian
b3f5e00ff5
use save_strategy from config if available ( #434 )
...
* use save_strategy from config if available
* update docs for save_strategy
2023-08-18 20:28:23 -04:00