Wing Lian
332984db18
lint fix that didn't get caught by linter ( #866 )
2023-11-15 14:36:40 -05:00
MilesQLi
48630f5b34
Update data.py for signature generation ( #851 )
...
* Update data.py
Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-11-15 14:12:32 -05:00
Zongheng Yang
b33c1d55a2
Docs: add instructions to 1-click launching on public clouds ( #862 )
...
* Update README.md
* Update ToC
2023-11-15 14:11:27 -05:00
Wing Lian
0c2a630326
multipack len should use max, not min ( #863 )
2023-11-15 12:52:32 -05:00
Wing Lian
db8a8afcba
adds llama and mistral dropout support ( #858 )
...
* adds llama and mistral dropout support
* gracefully handle attention dropout if not available yet
2023-11-15 12:28:50 -05:00
Wing Lian
14706504e3
various bugfixes ( #856 )
...
* various bugfixes
use latest tinyllama release
check if val_set_size is empty first
update sdp and xformers llama patches for updated upstream transformers
fix system prompt when no input
calculate total and total supervised tokens even when not sample packing
* add fix for when eval size is estimated to be too small
* should be len 1 for dataset length
* add catchall kwargs
2023-11-15 12:23:18 -05:00
NanoCode012
501b4d1379
chore(doc): Separate section on runpod ( #860 )
2023-11-16 01:06:51 +09:00
NanoCode012
306fe19c54
feat(doc): add more info on train_on_split ( #855 )
2023-11-15 23:42:26 +09:00
Fabian Preiß
614cff4107
include the suffix modified string in ascii art ( #852 )
2023-11-15 07:12:28 -05:00
Wing Lian
1a6309c8a6
cleanup the old multipack dataloader ( #841 )
2023-11-12 05:39:09 -05:00
Bryan Thornbury
105d0b350b
Pin optimum package ( #838 )
2023-11-09 22:36:15 -05:00
Wing Lian
f544ab2bed
don't compile deepspeed or bitsandbytes from source ( #837 )
2023-11-08 19:49:55 -05:00
Wing Lian
641e6f7e51
multipack w batch sampler ( #795 )
...
* test batch sampler w varying batch lens
* wip
* multipack batchsampler wip
* wip
* fix for prepare data loader to get correct # of steps based on gpues
* lint and clean up
* calculate len estimate
* fix total num steps calc
* add options for dataloader_num_workers and dataloader_pin_memory
* remove gitbook
* support prefetch_factor for dataloader optimization
* fix the kwarg
2023-11-07 20:27:40 -05:00
Wing Lian
6dc68a653f
use temp_dir kwarg instead
2023-11-06 18:33:01 -05:00
Wing Lian
7de6a5639c
missing dunder-init
2023-11-06 18:33:01 -05:00
Wing Lian
c74f045ba7
chore: lint
2023-11-06 18:33:01 -05:00
Wing Lian
0402d19759
make sure to cleanup tmp output_dir for e2e tests
2023-11-06 18:33:01 -05:00
Wing Lian
b2430ce670
use accelerate logging for zero/main loggin only
2023-11-06 18:32:26 -05:00
Wing Lian
4c834bf25d
cleanup verbosity a bit
2023-11-06 18:32:26 -05:00
Fabian Preiß
8056ecd30e
add deepspeed-kernels dependency for deepspeed>=0.12.0 ( #827 )
2023-11-05 07:52:56 -05:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
NanoCode012
6459ac7357
fix: pin autogptq ( #818 )
2023-11-03 10:14:55 -04:00
Wing Lian
964d858da0
fix model parallel ( #816 )
2023-11-02 21:34:22 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
9f7e8a971d
feat(doc): add dummyoptim faq fix ( #802 )
2023-10-29 23:06:06 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
MilesQLi
0800885e2f
Update to adapt to sharegpt datasets with "assistant" rather than "gp… ( #774 )
...
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-27 22:00:16 -04:00
Teknium
d3193beac3
Fix Deepspeed Zero3 Config ( #791 )
...
* Update zero3.json
Take away CPU Offload by default (Slows things down horribly, better off reducing batchsize), and changes LR Scheduler to a properly decaying one
* Update zero3.json
fix something
2023-10-27 21:57:02 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
chanvichetvong
facc49f32b
GitBook: No commit message
2023-10-26 15:11:00 +00:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
Casper
05bd6f1122
Threaded MultipackDistributedDataloader with prefetched samples ( #759 )
...
* Multithreading implementation [WIP]
* Added benchmarking
* 35% increased throughput
* Memory pinning
* Start threads in init
* Correct print of samples
* Sleep if queue is full
* Remove pin_memory (worse)
* Simplify logic to one thread
* Remove benchmark
* Use deque for constant speed
* Formatting
* Formatting
* Formatting
* Formatting
* Rollback to use queue
* Fix multi-epoch training
* Add num epochs arg
* Start thread in __iter__
* Formatting
* Use is_alive correctly
* Simplify loading thread
2023-10-26 07:49:52 +02:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Wing Lian
6c81c61bc4
refactor setup trainer so we can add more hooks ( #773 )
...
* refactor setup trainer so we can add more hooks
* Remove stray comma
2023-10-23 17:38:41 -04:00
Wing Lian
9b43e7ea15
disable eval table w sample packing in examples ( #778 )
2023-10-23 09:18:44 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
NanoCode012
44c9d0151a
Fix: Warn when fullfinetune without adapter ( #770 )
2023-10-22 15:41:43 -04:00
Wing Lian
ca84cca2c0
convert exponential notation lr to floats ( #771 )
2023-10-22 15:37:03 -04:00
Casper
32eeeb5b64
Hotfix for not saving correctly ( #762 )
2023-10-22 13:22:32 -04:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
NanoCode012
9923b72649
Fix: eval table conflict with eval_sample_packing ( #769 )
2023-10-23 01:18:12 +09:00
Wing Lian
21cf09b608
remove lora fused packing test ( #758 )
2023-10-21 22:59:35 -04:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
NanoCode012
8966a6f566
chore: bump transformers to v4.34.1 to fix tokenizer issue ( #745 )
2023-10-19 20:18:22 -04:00
Motoki Wu
e4d1585c4e
Fix DeepSpeed Zero 3 Saving ( #709 )
...
* Update train.py
* add zero3 check
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-19 19:18:24 -04:00