Wing Lian
ce33e1ed83
pin liger-kernel to latest 0.2.1 ( #1882 ) [skip ci]
2024-08-30 17:51:18 -04:00
Byron Hsu
e3a38450de
Add liger kernel to features ( #1881 ) [skip ci]
2024-08-29 08:19:18 -04:00
Aman Gupta Karmani
7037e3c836
deepseekv2 liger support ( #1878 )
...
* deepseekv2 liger support
* add comment
* add missing impl
2024-08-27 23:52:40 -04:00
Aman Gupta Karmani
c1a61ae23c
fix liger plugin load issues ( #1876 )
2024-08-27 23:08:26 -04:00
Aman Gupta Karmani
159b8b9a74
monkey-patch transformers to simplify monkey-patching modeling code ( #1877 )
...
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten
* unnecessary now
* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
1e43660701
Sample pack trust remote code v2 ( #1873 )
...
* fix the multipack patch for remote code models
* add deepseek v2 lite example w fsdp
2024-08-27 13:39:24 -04:00
Chiwan Park
f6362d2a05
Add Liger Kernal support for Qwen2 ( #1871 )
2024-08-27 13:03:16 -04:00
Wing Lian
17af1d7081
clear cuda cache to help with memory leak/creep ( #1858 )
...
* clear cuda cache to help with memory leak/creep
* reverse order of gc
2024-08-26 15:50:26 -04:00
Chiwan Park
2dac1edf72
Fix drop_long_seq bug due to truncation in prompt tokenization strategies when using chat_template ( #1867 )
2024-08-26 12:56:12 -04:00
Wing Lian
6819c12cee
update specturm authors ( #1869 )
2024-08-26 12:00:36 -04:00
Wing Lian
8e29bdefdd
Spectrum plugin ( #1866 )
2024-08-25 17:54:02 -04:00
Wing Lian
f245964f22
better handling of llama-3 tool rolw ( #1782 )
2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55
simplify logic ( #1856 )
2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2
change up import to prevent AttributeError ( #1863 )
...
* change up import to prevent AttributeError
* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
810ecd4e81
add liger to readme ( #1865 )
...
* add liger to readme
* updates from PR feedback
2024-08-23 14:34:03 -04:00
Wing Lian
da0d581a8c
add liger example ( #1864 )
2024-08-23 12:37:50 -04:00
Wing Lian
1f686c576c
Liger Kernel integration ( #1861 )
...
* add initial plugin support w Liger kernel patches
* integrate the input args classes
* fix liger plugin and dynamic configuration class
* drop untrainable samples and refactor config plugins integration
* fix incorrect inputs and circular imports
* fix bool comparison
* fix for dropping untraibable tokens
* fix licensing so liger integration is Apache 2.0
* add jamba support
* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
e8ff5d5738
don't mess with bnb since it needs compiled wheels ( #1859 )
2024-08-23 12:18:47 -04:00
Wing Lian
328fd4b3b7
add axolotl community license ( #1862 )
2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
b33dc07a77
rename nightly test and add badge ( #1853 )
2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983
run nightly ci builds against upstream main ( #1851 )
...
* run nightly ci builds against upstream main
* add test badges
* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
2f8037fee6
ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed ( #1850 ) [skip ci]
2024-08-22 13:10:40 -04:00
Aman Gupta Karmani
de4ea2d1f2
docs: minor syntax highlight fix ( #1839 )
2024-08-22 11:47:34 -04:00
JohanWork
7ed92e61c2
fix: prompt phi ( #1845 ) [skip ci]
...
* corecting phi system prompt
* phi test
* update
* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699
make the train_on_eos default to turn so all eos tokens are treated the same ( #1847 ) [skip ci]
2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38
ensure that the bias is also in the correct dtype ( #1848 ) [skip ci]
...
* ensure that the bias is also in the correct dtype
* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Wing Lian
c3fc529bfc
numpy 2.1.0 was released, but incompatible with numba ( #1849 ) [skip ci]
2024-08-22 11:44:45 -04:00
Gal Cohen (galco)
957c956f89
rename jamba example ( #1846 ) [skip ci]
...
* rename jamba example
* feat: change readme
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-22 09:22:55 -04:00
Aman Gupta Karmani
f07802f9fa
examples: fix tiny-llama pretrain yml syntax ( #1840 )
2024-08-21 13:37:51 -04:00
Gal Cohen (galco)
9f917245f6
feat: add jamba chat_template ( #1843 )
...
* feat: add jamba chat_template
* fix: black
* feat: jamba fsdp+qlora
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3
pretrain: fix with sample_packing=false ( #1841 )
2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284
fix: dont change quant storage dtype in case of fsdp ( #1837 )
...
* fix: dont change quant storage dtype in case of fsdp
* fix black
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b
optionally save the final FSDP model as a sharded state dict ( #1828 )
...
* efficiently save very large llms when using FSDP
* fix parsing and index of sharded chunks
* only save fsdp on main process
* debugging for rename
* save sharded state dict
* remove unused new param
* get state dict directly
* tweak acc merge fsdp to shard the weight files
* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222
add validation to prevent 8bit lora finetuning on H100s ( #1827 )
2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90
update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model ( #1821 )
...
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a
fix: parse model_kwargs ( #1825 )
2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b
fix: parse eager_attention ( #1824 )
2024-08-14 09:46:46 -04:00
Wing Lian
1853d6021d
bump hf dependencies ( #1823 )
...
* bump hf dependencies
* revert optimum version change
* don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that
2024-08-11 16:27:41 -04:00
Chiwan Park
0801f239cc
fix the incorrect max_length for chat template ( #1818 )
2024-08-09 11:50:31 -04:00
Wing Lian
54392ac8a6
Attempt to run multigpu in PR CI for now to ensure it works ( #1815 ) [skip ci]
...
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
3e2b269d06
update tinyllama to use final instead of checkpoints ( #1820 ) [skip ci]
2024-08-09 10:58:19 -04:00
Wing Lian
5ee4b7325f
fix z3 leaf configuration when not using lists ( #1817 ) [skip ci]
2024-08-09 10:54:52 -04:00
Wing Lian
70978467a0
skip no commit to main on ci ( #1814 )
2024-08-06 15:25:54 -04:00
Wing Lian
850f999a76
update peft and transformers ( #1811 )
2024-08-06 10:32:05 -04:00
Wing Lian
c56e0a79a5
logging improvements ( #1808 ) [skip ci]
...
* logging improvements
* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78
set z3 leaf for deepseek v2 ( #1809 ) [skip ci]
...
* set z3 leaf for deepseek v2
* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0
remove un-necessary zero-first guard as it's already only called in a parent fn ( #1810 ) [skip ci]
2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de
One cycle lr ( #1803 )
...
* refactor one_cycle lr scheduler so it's reusable in more situations
* fix validation for lr_scheduler
* default to cosine anneal strategy
* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
Ben Feuer
b7665c26c8
Update conversation.qmd ( #1788 ) [skip ci]
2024-08-05 12:44:26 -04:00