Wing Lian
4c834bf25d
cleanup verbosity a bit
2023-11-06 18:32:26 -05:00
Fabian Preiß
8056ecd30e
add deepspeed-kernels dependency for deepspeed>=0.12.0 ( #827 )
2023-11-05 07:52:56 -05:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
NanoCode012
6459ac7357
fix: pin autogptq ( #818 )
2023-11-03 10:14:55 -04:00
Wing Lian
964d858da0
fix model parallel ( #816 )
2023-11-02 21:34:22 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
9f7e8a971d
feat(doc): add dummyoptim faq fix ( #802 )
2023-10-29 23:06:06 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
MilesQLi
0800885e2f
Update to adapt to sharegpt datasets with "assistant" rather than "gp… ( #774 )
...
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-27 22:00:16 -04:00
Teknium
d3193beac3
Fix Deepspeed Zero3 Config ( #791 )
...
* Update zero3.json
Take away CPU Offload by default (Slows things down horribly, better off reducing batchsize), and changes LR Scheduler to a properly decaying one
* Update zero3.json
fix something
2023-10-27 21:57:02 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
chanvichetvong
facc49f32b
GitBook: No commit message
2023-10-26 15:11:00 +00:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
Casper
05bd6f1122
Threaded MultipackDistributedDataloader with prefetched samples ( #759 )
...
* Multithreading implementation [WIP]
* Added benchmarking
* 35% increased throughput
* Memory pinning
* Start threads in init
* Correct print of samples
* Sleep if queue is full
* Remove pin_memory (worse)
* Simplify logic to one thread
* Remove benchmark
* Use deque for constant speed
* Formatting
* Formatting
* Formatting
* Formatting
* Rollback to use queue
* Fix multi-epoch training
* Add num epochs arg
* Start thread in __iter__
* Formatting
* Use is_alive correctly
* Simplify loading thread
2023-10-26 07:49:52 +02:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Wing Lian
6c81c61bc4
refactor setup trainer so we can add more hooks ( #773 )
...
* refactor setup trainer so we can add more hooks
* Remove stray comma
2023-10-23 17:38:41 -04:00
Wing Lian
9b43e7ea15
disable eval table w sample packing in examples ( #778 )
2023-10-23 09:18:44 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
NanoCode012
44c9d0151a
Fix: Warn when fullfinetune without adapter ( #770 )
2023-10-22 15:41:43 -04:00
Wing Lian
ca84cca2c0
convert exponential notation lr to floats ( #771 )
2023-10-22 15:37:03 -04:00
Casper
32eeeb5b64
Hotfix for not saving correctly ( #762 )
2023-10-22 13:22:32 -04:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
NanoCode012
9923b72649
Fix: eval table conflict with eval_sample_packing ( #769 )
2023-10-23 01:18:12 +09:00
Wing Lian
21cf09b608
remove lora fused packing test ( #758 )
2023-10-21 22:59:35 -04:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
NanoCode012
8966a6f566
chore: bump transformers to v4.34.1 to fix tokenizer issue ( #745 )
2023-10-19 20:18:22 -04:00
Motoki Wu
e4d1585c4e
Fix DeepSpeed Zero 3 Saving ( #709 )
...
* Update train.py
* add zero3 check
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-19 19:18:24 -04:00
Wing Lian
70157ccb8f
add a latest tag for regular axolotl image, cleanup extraneous print statement ( #746 )
2023-10-19 12:28:29 -04:00
seungduk.kim.2304
3a99495b05
improve: Enhance code readability of prompt_tokenizers.py ( #707 )
2023-10-19 08:12:17 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Napuh
992d57f20a
catch ConnectionError when checking dataset from HuggingFace ( #743 )
2023-10-18 22:11:54 -04:00
mhenrichsen
91a016f410
badge ( #739 )
...
* badge
* fixed text
2023-10-18 10:21:34 -04:00
Casper
a045db0214
Mistral: Sliding Window Attention with Flash Attention and Sample Packing ( #732 )
...
* Implement Mistral FA + SWA + Sample Packing
* Handle unbroadcastable tensor
* chore: lint
* Simplify _prepare_decoder_attention_mask
* Uncomment window size
* Upgrade flash-attn to minimum of 2.3.0 to support SWA
* Add original condition to avoid error during inference
* chore: lint
* use torchscript to prevent oom
* chore: pylint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-16 15:13:46 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Wing Lian
3553172e3c
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention ( #728 )
2023-10-14 09:27:07 -04:00
Wing Lian
7f2027d93f
tweak for xformers install w pytorch 2.1.0 ( #727 )
2023-10-13 15:21:17 -04:00
Wing Lian
8d288a2ad4
workaround for installing xformers w torch 2.1.0 ( #725 )
2023-10-13 11:19:30 -04:00
Wing Lian
f30afe4544
misc sharegpt fixes ( #723 )
...
* support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset
* invalid role is actually not possible
* update tokenized fixture for corrected labels
2023-10-13 11:04:39 -04:00
Wing Lian
bfbdba8614
pin xformers >= 0.0.22 ( #724 )
2023-10-13 10:27:56 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
Wing Lian
2aa1f71464
fix pytorch 2.1.0 build, add multipack docs ( #722 )
2023-10-13 08:57:28 -04:00
Wing Lian
1c412c7e9d
improve handling of the prepared ds path and other cfg defaults ( #701 )
2023-10-13 07:46:07 -04:00
Jan Philipp Harries
490923fb78
Save Axolotl config as WandB artifact ( #716 )
2023-10-11 07:28:12 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
atgctg
ace70b33c6
Fix: lowercase True values in config ( #713 )
...
* Fix: lowercase `True` values in config
* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00