Keith Stevens
7b9f669a3a
Trigger the original tokenization behavior when no advanced turn settings are provided ( #1915 )
2024-09-14 08:22:54 -04:00
Wing Lian
5c42f11411
remove dynamic module loader monkeypatch as this was fixed upstream ( #1914 )
2024-09-13 22:19:54 -04:00
Wing Lian
3853ab7ae9
bump accelerate to 0.34.2 ( #1901 )
...
* bump accelerate
* add fixture to predownload the test model
* change fixture
2024-09-07 14:39:31 -04:00
Wing Lian
6e354682e3
fix zero3 integration ( #1897 )
...
* fix zero3 integration
* bump transformers and accelerate too
2024-09-05 10:58:50 -04:00
Alpay Ariyak
ab461d83c4
Fix documentation for pre-tokenized dataset ( #1894 )
...
It's currently asking to not add BOS and EOS, stating that Axolotl adds them, but this is not true
2024-09-05 23:11:31 +09:00
Wing Lian
93b769a979
lint fix and update gha regex ( #1899 )
2024-09-05 09:58:21 -04:00
Tijmen de Haan
f18f4268b5
Docs for AMD-based HPC systems ( #1891 )
...
* Add documentation for installing on AMD-based HPC systems.
* Accept suggestion to add note about deepspeed
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update _quarto.yml with amd_hpc doc
---------
Co-authored-by: Tijmen de Haan <tijmen.dehaan@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-09-05 18:33:19 +09:00
Wing Lian
dca1fe47d4
fix optimizer + fsdp combination in example ( #1893 )
2024-09-04 11:28:47 -04:00
Wing Lian
4e5400c732
support for auto_find_batch_size when packing ( #1885 )
...
* support for auto_find_batch_size when packing
* make sure to return data from validation
* make sure to return data from validation
* actually expose multipack_real_batches in the config
* calculate gathered efficiency in sampler
* tweak to fix auto find and use actual sampler len for multipack
* uncomment
* use args for bsz when not available from auto find
2024-09-03 20:02:44 -04:00
Wing Lian
0aeb277456
add e2e smoke tests for llama liger integration ( #1884 )
...
* add e2e smoke tests for llama liger integration
* fix import
* don't use __main__ for test
* consolidate line
2024-09-01 19:29:37 -04:00
Chiwan Park
bdab3ec587
Fix RMSNorm monkey patch for Gemma models ( #1886 )
2024-09-01 18:34:24 -04:00
Wing Lian
3c6b9eda2e
run pytests with varied pytorch versions too ( #1883 )
2024-08-31 22:49:35 -04:00
DocShotgun
15408d0f09
Update supported models for Liger Kernel ( #1875 )
...
* Update supported models for Liger Kernel
Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE
* move import to their appropriate conditions
* Integrate Phi3 LCE support
https://github.com/linkedin/Liger-Kernel/pull/103/
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-08-31 21:59:48 -04:00
Wing Lian
ce33e1ed83
pin liger-kernel to latest 0.2.1 ( #1882 ) [skip ci]
2024-08-30 17:51:18 -04:00
Byron Hsu
e3a38450de
Add liger kernel to features ( #1881 ) [skip ci]
2024-08-29 08:19:18 -04:00
Aman Gupta Karmani
7037e3c836
deepseekv2 liger support ( #1878 )
...
* deepseekv2 liger support
* add comment
* add missing impl
2024-08-27 23:52:40 -04:00
Aman Gupta Karmani
c1a61ae23c
fix liger plugin load issues ( #1876 )
2024-08-27 23:08:26 -04:00
Aman Gupta Karmani
159b8b9a74
monkey-patch transformers to simplify monkey-patching modeling code ( #1877 )
...
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten
* unnecessary now
* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
1e43660701
Sample pack trust remote code v2 ( #1873 )
...
* fix the multipack patch for remote code models
* add deepseek v2 lite example w fsdp
2024-08-27 13:39:24 -04:00
Chiwan Park
f6362d2a05
Add Liger Kernal support for Qwen2 ( #1871 )
2024-08-27 13:03:16 -04:00
Wing Lian
17af1d7081
clear cuda cache to help with memory leak/creep ( #1858 )
...
* clear cuda cache to help with memory leak/creep
* reverse order of gc
2024-08-26 15:50:26 -04:00
Chiwan Park
2dac1edf72
Fix drop_long_seq bug due to truncation in prompt tokenization strategies when using chat_template ( #1867 )
2024-08-26 12:56:12 -04:00
Wing Lian
6819c12cee
update specturm authors ( #1869 )
2024-08-26 12:00:36 -04:00
Wing Lian
8e29bdefdd
Spectrum plugin ( #1866 )
2024-08-25 17:54:02 -04:00
Wing Lian
f245964f22
better handling of llama-3 tool rolw ( #1782 )
2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55
simplify logic ( #1856 )
2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2
change up import to prevent AttributeError ( #1863 )
...
* change up import to prevent AttributeError
* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
810ecd4e81
add liger to readme ( #1865 )
...
* add liger to readme
* updates from PR feedback
2024-08-23 14:34:03 -04:00
Wing Lian
da0d581a8c
add liger example ( #1864 )
2024-08-23 12:37:50 -04:00
Wing Lian
1f686c576c
Liger Kernel integration ( #1861 )
...
* add initial plugin support w Liger kernel patches
* integrate the input args classes
* fix liger plugin and dynamic configuration class
* drop untrainable samples and refactor config plugins integration
* fix incorrect inputs and circular imports
* fix bool comparison
* fix for dropping untraibable tokens
* fix licensing so liger integration is Apache 2.0
* add jamba support
* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
e8ff5d5738
don't mess with bnb since it needs compiled wheels ( #1859 )
2024-08-23 12:18:47 -04:00
Wing Lian
328fd4b3b7
add axolotl community license ( #1862 )
2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
b33dc07a77
rename nightly test and add badge ( #1853 )
2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983
run nightly ci builds against upstream main ( #1851 )
...
* run nightly ci builds against upstream main
* add test badges
* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
2f8037fee6
ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed ( #1850 ) [skip ci]
2024-08-22 13:10:40 -04:00
Aman Gupta Karmani
de4ea2d1f2
docs: minor syntax highlight fix ( #1839 )
2024-08-22 11:47:34 -04:00
JohanWork
7ed92e61c2
fix: prompt phi ( #1845 ) [skip ci]
...
* corecting phi system prompt
* phi test
* update
* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699
make the train_on_eos default to turn so all eos tokens are treated the same ( #1847 ) [skip ci]
2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38
ensure that the bias is also in the correct dtype ( #1848 ) [skip ci]
...
* ensure that the bias is also in the correct dtype
* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Wing Lian
c3fc529bfc
numpy 2.1.0 was released, but incompatible with numba ( #1849 ) [skip ci]
2024-08-22 11:44:45 -04:00
Gal Cohen (galco)
957c956f89
rename jamba example ( #1846 ) [skip ci]
...
* rename jamba example
* feat: change readme
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-22 09:22:55 -04:00
Aman Gupta Karmani
f07802f9fa
examples: fix tiny-llama pretrain yml syntax ( #1840 )
2024-08-21 13:37:51 -04:00
Gal Cohen (galco)
9f917245f6
feat: add jamba chat_template ( #1843 )
...
* feat: add jamba chat_template
* fix: black
* feat: jamba fsdp+qlora
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3
pretrain: fix with sample_packing=false ( #1841 )
2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284
fix: dont change quant storage dtype in case of fsdp ( #1837 )
...
* fix: dont change quant storage dtype in case of fsdp
* fix black
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b
optionally save the final FSDP model as a sharded state dict ( #1828 )
...
* efficiently save very large llms when using FSDP
* fix parsing and index of sharded chunks
* only save fsdp on main process
* debugging for rename
* save sharded state dict
* remove unused new param
* get state dict directly
* tweak acc merge fsdp to shard the weight files
* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222
add validation to prevent 8bit lora finetuning on H100s ( #1827 )
2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90
update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model ( #1821 )
...
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a
fix: parse model_kwargs ( #1825 )
2024-08-16 07:51:19 -04:00