Wing Lian
1a6309c8a6
cleanup the old multipack dataloader ( #841 )
2023-11-12 05:39:09 -05:00
Wing Lian
641e6f7e51
multipack w batch sampler ( #795 )
...
* test batch sampler w varying batch lens
* wip
* multipack batchsampler wip
* wip
* fix for prepare data loader to get correct # of steps based on gpues
* lint and clean up
* calculate len estimate
* fix total num steps calc
* add options for dataloader_num_workers and dataloader_pin_memory
* remove gitbook
* support prefetch_factor for dataloader optimization
* fix the kwarg
2023-11-07 20:27:40 -05:00
Wing Lian
b2430ce670
use accelerate logging for zero/main loggin only
2023-11-06 18:32:26 -05:00
Wing Lian
4c834bf25d
cleanup verbosity a bit
2023-11-06 18:32:26 -05:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
964d858da0
fix model parallel ( #816 )
2023-11-02 21:34:22 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
MilesQLi
0800885e2f
Update to adapt to sharegpt datasets with "assistant" rather than "gp… ( #774 )
...
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-27 22:00:16 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
Casper
05bd6f1122
Threaded MultipackDistributedDataloader with prefetched samples ( #759 )
...
* Multithreading implementation [WIP]
* Added benchmarking
* 35% increased throughput
* Memory pinning
* Start threads in init
* Correct print of samples
* Sleep if queue is full
* Remove pin_memory (worse)
* Simplify logic to one thread
* Remove benchmark
* Use deque for constant speed
* Formatting
* Formatting
* Formatting
* Formatting
* Rollback to use queue
* Fix multi-epoch training
* Add num epochs arg
* Start thread in __iter__
* Formatting
* Use is_alive correctly
* Simplify loading thread
2023-10-26 07:49:52 +02:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Wing Lian
6c81c61bc4
refactor setup trainer so we can add more hooks ( #773 )
...
* refactor setup trainer so we can add more hooks
* Remove stray comma
2023-10-23 17:38:41 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
NanoCode012
44c9d0151a
Fix: Warn when fullfinetune without adapter ( #770 )
2023-10-22 15:41:43 -04:00
Wing Lian
ca84cca2c0
convert exponential notation lr to floats ( #771 )
2023-10-22 15:37:03 -04:00
Casper
32eeeb5b64
Hotfix for not saving correctly ( #762 )
2023-10-22 13:22:32 -04:00
NanoCode012
9923b72649
Fix: eval table conflict with eval_sample_packing ( #769 )
2023-10-23 01:18:12 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Motoki Wu
e4d1585c4e
Fix DeepSpeed Zero 3 Saving ( #709 )
...
* Update train.py
* add zero3 check
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-19 19:18:24 -04:00
Wing Lian
70157ccb8f
add a latest tag for regular axolotl image, cleanup extraneous print statement ( #746 )
2023-10-19 12:28:29 -04:00
seungduk.kim.2304
3a99495b05
improve: Enhance code readability of prompt_tokenizers.py ( #707 )
2023-10-19 08:12:17 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Napuh
992d57f20a
catch ConnectionError when checking dataset from HuggingFace ( #743 )
2023-10-18 22:11:54 -04:00
Casper
a045db0214
Mistral: Sliding Window Attention with Flash Attention and Sample Packing ( #732 )
...
* Implement Mistral FA + SWA + Sample Packing
* Handle unbroadcastable tensor
* chore: lint
* Simplify _prepare_decoder_attention_mask
* Uncomment window size
* Upgrade flash-attn to minimum of 2.3.0 to support SWA
* Add original condition to avoid error during inference
* chore: lint
* use torchscript to prevent oom
* chore: pylint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-16 15:13:46 -04:00
Wing Lian
3553172e3c
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention ( #728 )
2023-10-14 09:27:07 -04:00
Wing Lian
f30afe4544
misc sharegpt fixes ( #723 )
...
* support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset
* invalid role is actually not possible
* update tokenized fixture for corrected labels
2023-10-13 11:04:39 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
Wing Lian
1c412c7e9d
improve handling of the prepared ds path and other cfg defaults ( #701 )
2023-10-13 07:46:07 -04:00
Jan Philipp Harries
490923fb78
Save Axolotl config as WandB artifact ( #716 )
2023-10-11 07:28:12 -04:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Wing Lian
2d60ba3a6e
flash_attention + sample packing for stablelm 3b ( #671 )
...
* stablelm epoch fa patch
* is causal for fa
* working stablelm fa w packing
* chore: pre-commit linting
2023-10-05 16:03:43 -04:00
NanoCode012
eb480dfd68
Fix: ValueError when FA + Mistral when padding_side=right ( #681 )
...
* Fix: ValueError when FA + Mistral when padding_side=right
* fix: remove tokenizer class check
2023-10-06 04:12:54 +09:00
NanoCode012
69fac9a020
Fix: Future deprecation warning with use_auth_token ( #680 )
2023-10-06 03:56:18 +09:00
NanoCode012
e0b7eeabfd
Fix(tokenizer): Set rstrip,lstrip,norm to False ( #678 )
2023-10-06 03:50:49 +09:00
NanoCode012
e62d5901b5
chore: Clean up repetitive model kwargs ( #670 )
2023-10-04 20:41:26 +09:00
NanoCode012
697c50d408
Feat: Allow usage of native Mistral FA when no sample_packing ( #669 )
...
* Allow usage of native Mistral FA when no sample_packing
* fix: do not apply custom patch when sample_pack off
* chore: lint
* chore: pin transformer to v4.35.0.dev0
* fix: split sample_packing to separate test
2023-10-04 20:40:47 +09:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Wing Lian
f34648c8b9
remove patch fix for phi ( #664 )
2023-10-02 21:07:41 -04:00
Wing Lian
e50a64e85e
prepared dataset caching, other misc fixes ( #665 )
...
* prepared dataset caching, other misc fixes
* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00
Wing Lian
8662e8ffe8
don't strip the prompt for check since we don't strip to tokenize anymore ( #650 )
2023-09-28 12:21:51 -04:00
Wing Lian
b2edaaeff6
fix for flash attn w mistral w/o sammple packing ( #648 )
2023-09-28 10:57:37 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00
NanoCode012
383f88d7a7
Fix(cfg): Add validation for save_strategy and eval_strategy ( #633 )
...
* Fix(cfg): Check save_strategy cfg conflict with save_steps
* Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps
* chore: add extra check for steps only
2023-09-28 10:14:41 +09:00
Wing Lian
b6ab8aad62
Mistral flash attn packing ( #646 )
...
* add mistral monkeypatch
* add arg for decoder attention masl
* fix lint for duplicate code
* make sure to update transformers too
* tweak install for e2e
* move mistral patch to conditional
2023-09-27 18:41:00 -04:00