Wing Lian
3e3229e2d9
fix for qwen w lora ( #906 )
2023-11-30 12:45:50 -05:00
Wing Lian
1d21aa6b0a
ensure merged model matches the training dtype ( #902 )
...
* ensure merged model matches the training dtype
* Update src/axolotl/cli/__init__.py
* Update src/axolotl/cli/__init__.py
2023-11-29 09:55:19 -05:00
kallewoof
71b7ea3c05
Determine FSDP/deepspeed settings on device select. ( #883 )
...
* Determine FSDP/deepspeed settings on device select.
Without this, the OS env check for accelerate will fail.
* rename and move env setup call
* chore: lint
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-11-29 08:36:35 -05:00
NanoCode012
1115c501b8
Feat: Add Qwen ( #894 )
...
* Feat: Add Qwen
* feat: add qwen lora example
* feat: update matrix
* fix: add trust_remote_code
* fix: disable gradient checkpointing
* chore: add warning about gradient checkpointing
* fix: config
* fix: turn off sample packing for this example and reduce seq len
* chore: add comment on seq len
2023-11-26 00:05:01 +09:00
NanoCode012
7ee3c4cacb
fix: warning should not show if eval_batch_size not provided ( #896 )
2023-11-25 16:04:00 +09:00
NanoCode012
fb12895a17
Feat: Add warmup_ratio ( #893 )
...
* Feat: Add warmup_ratio
* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
575a082aae
fix: revert local dir dataset load ( #878 )
2023-11-18 22:50:41 +09:00
Wing Lian
9bf854e59c
Phi update 202311 ( #876 )
...
* add phi modeling from hf
* update for packing and use new modeling class for phi
* update e2e tests for phi to use new model name
* update example phi to also use new phi model name
* use AutoModelForCausalLM for phi lora since sample packing isn't supported
2023-11-17 12:47:17 -05:00
Wing Lian
797f3dd1de
don't train if eval split is too small ( #873 )
...
* allow zero len dataset
* better handling and warning of small eval splits
* raise error if eval split is too small
* don't mess with calculating total num steps in distributed context
* fix eval_sample_packing training args logic
2023-11-16 11:35:42 -05:00
NanoCode012
3cc67d2cdd
Feat: Add dataset loading from S3, GCS ( #765 )
...
* Feat: Add dataset loading from S3, GCS
* chore: update docs
* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb
allow overriding of model_config parameters from the YML ( #853 )
...
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
MilesQLi
48630f5b34
Update data.py for signature generation ( #851 )
...
* Update data.py
Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-11-15 14:12:32 -05:00
Wing Lian
0c2a630326
multipack len should use max, not min ( #863 )
2023-11-15 12:52:32 -05:00
Wing Lian
db8a8afcba
adds llama and mistral dropout support ( #858 )
...
* adds llama and mistral dropout support
* gracefully handle attention dropout if not available yet
2023-11-15 12:28:50 -05:00
Wing Lian
14706504e3
various bugfixes ( #856 )
...
* various bugfixes
use latest tinyllama release
check if val_set_size is empty first
update sdp and xformers llama patches for updated upstream transformers
fix system prompt when no input
calculate total and total supervised tokens even when not sample packing
* add fix for when eval size is estimated to be too small
* should be len 1 for dataset length
* add catchall kwargs
2023-11-15 12:23:18 -05:00
Fabian Preiß
614cff4107
include the suffix modified string in ascii art ( #852 )
2023-11-15 07:12:28 -05:00
Wing Lian
1a6309c8a6
cleanup the old multipack dataloader ( #841 )
2023-11-12 05:39:09 -05:00
Wing Lian
641e6f7e51
multipack w batch sampler ( #795 )
...
* test batch sampler w varying batch lens
* wip
* multipack batchsampler wip
* wip
* fix for prepare data loader to get correct # of steps based on gpues
* lint and clean up
* calculate len estimate
* fix total num steps calc
* add options for dataloader_num_workers and dataloader_pin_memory
* remove gitbook
* support prefetch_factor for dataloader optimization
* fix the kwarg
2023-11-07 20:27:40 -05:00
Wing Lian
b2430ce670
use accelerate logging for zero/main loggin only
2023-11-06 18:32:26 -05:00
Wing Lian
4c834bf25d
cleanup verbosity a bit
2023-11-06 18:32:26 -05:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
964d858da0
fix model parallel ( #816 )
2023-11-02 21:34:22 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
MilesQLi
0800885e2f
Update to adapt to sharegpt datasets with "assistant" rather than "gp… ( #774 )
...
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-27 22:00:16 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
Casper
05bd6f1122
Threaded MultipackDistributedDataloader with prefetched samples ( #759 )
...
* Multithreading implementation [WIP]
* Added benchmarking
* 35% increased throughput
* Memory pinning
* Start threads in init
* Correct print of samples
* Sleep if queue is full
* Remove pin_memory (worse)
* Simplify logic to one thread
* Remove benchmark
* Use deque for constant speed
* Formatting
* Formatting
* Formatting
* Formatting
* Rollback to use queue
* Fix multi-epoch training
* Add num epochs arg
* Start thread in __iter__
* Formatting
* Use is_alive correctly
* Simplify loading thread
2023-10-26 07:49:52 +02:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Wing Lian
6c81c61bc4
refactor setup trainer so we can add more hooks ( #773 )
...
* refactor setup trainer so we can add more hooks
* Remove stray comma
2023-10-23 17:38:41 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
NanoCode012
44c9d0151a
Fix: Warn when fullfinetune without adapter ( #770 )
2023-10-22 15:41:43 -04:00
Wing Lian
ca84cca2c0
convert exponential notation lr to floats ( #771 )
2023-10-22 15:37:03 -04:00
Casper
32eeeb5b64
Hotfix for not saving correctly ( #762 )
2023-10-22 13:22:32 -04:00
NanoCode012
9923b72649
Fix: eval table conflict with eval_sample_packing ( #769 )
2023-10-23 01:18:12 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Motoki Wu
e4d1585c4e
Fix DeepSpeed Zero 3 Saving ( #709 )
...
* Update train.py
* add zero3 check
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-19 19:18:24 -04:00
Wing Lian
70157ccb8f
add a latest tag for regular axolotl image, cleanup extraneous print statement ( #746 )
2023-10-19 12:28:29 -04:00
seungduk.kim.2304
3a99495b05
improve: Enhance code readability of prompt_tokenizers.py ( #707 )
2023-10-19 08:12:17 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Napuh
992d57f20a
catch ConnectionError when checking dataset from HuggingFace ( #743 )
2023-10-18 22:11:54 -04:00
Casper
a045db0214
Mistral: Sliding Window Attention with Flash Attention and Sample Packing ( #732 )
...
* Implement Mistral FA + SWA + Sample Packing
* Handle unbroadcastable tensor
* chore: lint
* Simplify _prepare_decoder_attention_mask
* Uncomment window size
* Upgrade flash-attn to minimum of 2.3.0 to support SWA
* Add original condition to avoid error during inference
* chore: lint
* use torchscript to prevent oom
* chore: pylint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-16 15:13:46 -04:00
Wing Lian
3553172e3c
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention ( #728 )
2023-10-14 09:27:07 -04:00
Wing Lian
f30afe4544
misc sharegpt fixes ( #723 )
...
* support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset
* invalid role is actually not possible
* update tokenized fixture for corrected labels
2023-10-13 11:04:39 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
Wing Lian
1c412c7e9d
improve handling of the prepared ds path and other cfg defaults ( #701 )
2023-10-13 07:46:07 -04:00
Jan Philipp Harries
490923fb78
Save Axolotl config as WandB artifact ( #716 )
2023-10-11 07:28:12 -04:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Wing Lian
2d60ba3a6e
flash_attention + sample packing for stablelm 3b ( #671 )
...
* stablelm epoch fa patch
* is causal for fa
* working stablelm fa w packing
* chore: pre-commit linting
2023-10-05 16:03:43 -04:00