Wing Lian
f245964f22
better handling of llama-3 tool rolw ( #1782 )
2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55
simplify logic ( #1856 )
2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2
change up import to prevent AttributeError ( #1863 )
...
* change up import to prevent AttributeError
* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
810ecd4e81
add liger to readme ( #1865 )
...
* add liger to readme
* updates from PR feedback
2024-08-23 14:34:03 -04:00
Wing Lian
da0d581a8c
add liger example ( #1864 )
2024-08-23 12:37:50 -04:00
Wing Lian
1f686c576c
Liger Kernel integration ( #1861 )
...
* add initial plugin support w Liger kernel patches
* integrate the input args classes
* fix liger plugin and dynamic configuration class
* drop untrainable samples and refactor config plugins integration
* fix incorrect inputs and circular imports
* fix bool comparison
* fix for dropping untraibable tokens
* fix licensing so liger integration is Apache 2.0
* add jamba support
* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
e8ff5d5738
don't mess with bnb since it needs compiled wheels ( #1859 )
2024-08-23 12:18:47 -04:00
Wing Lian
328fd4b3b7
add axolotl community license ( #1862 )
2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
b33dc07a77
rename nightly test and add badge ( #1853 )
2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983
run nightly ci builds against upstream main ( #1851 )
...
* run nightly ci builds against upstream main
* add test badges
* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
2f8037fee6
ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed ( #1850 ) [skip ci]
2024-08-22 13:10:40 -04:00
Aman Gupta Karmani
de4ea2d1f2
docs: minor syntax highlight fix ( #1839 )
2024-08-22 11:47:34 -04:00
JohanWork
7ed92e61c2
fix: prompt phi ( #1845 ) [skip ci]
...
* corecting phi system prompt
* phi test
* update
* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699
make the train_on_eos default to turn so all eos tokens are treated the same ( #1847 ) [skip ci]
2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38
ensure that the bias is also in the correct dtype ( #1848 ) [skip ci]
...
* ensure that the bias is also in the correct dtype
* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Wing Lian
c3fc529bfc
numpy 2.1.0 was released, but incompatible with numba ( #1849 ) [skip ci]
2024-08-22 11:44:45 -04:00
Gal Cohen (galco)
957c956f89
rename jamba example ( #1846 ) [skip ci]
...
* rename jamba example
* feat: change readme
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-22 09:22:55 -04:00
Aman Gupta Karmani
f07802f9fa
examples: fix tiny-llama pretrain yml syntax ( #1840 )
2024-08-21 13:37:51 -04:00
Gal Cohen (galco)
9f917245f6
feat: add jamba chat_template ( #1843 )
...
* feat: add jamba chat_template
* fix: black
* feat: jamba fsdp+qlora
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3
pretrain: fix with sample_packing=false ( #1841 )
2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284
fix: dont change quant storage dtype in case of fsdp ( #1837 )
...
* fix: dont change quant storage dtype in case of fsdp
* fix black
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b
optionally save the final FSDP model as a sharded state dict ( #1828 )
...
* efficiently save very large llms when using FSDP
* fix parsing and index of sharded chunks
* only save fsdp on main process
* debugging for rename
* save sharded state dict
* remove unused new param
* get state dict directly
* tweak acc merge fsdp to shard the weight files
* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222
add validation to prevent 8bit lora finetuning on H100s ( #1827 )
2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90
update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model ( #1821 )
...
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a
fix: parse model_kwargs ( #1825 )
2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b
fix: parse eager_attention ( #1824 )
2024-08-14 09:46:46 -04:00
Wing Lian
1853d6021d
bump hf dependencies ( #1823 )
...
* bump hf dependencies
* revert optimum version change
* don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that
2024-08-11 16:27:41 -04:00
Chiwan Park
0801f239cc
fix the incorrect max_length for chat template ( #1818 )
2024-08-09 11:50:31 -04:00
Wing Lian
54392ac8a6
Attempt to run multigpu in PR CI for now to ensure it works ( #1815 ) [skip ci]
...
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
3e2b269d06
update tinyllama to use final instead of checkpoints ( #1820 ) [skip ci]
2024-08-09 10:58:19 -04:00
Wing Lian
5ee4b7325f
fix z3 leaf configuration when not using lists ( #1817 ) [skip ci]
2024-08-09 10:54:52 -04:00
Wing Lian
70978467a0
skip no commit to main on ci ( #1814 )
2024-08-06 15:25:54 -04:00
Wing Lian
850f999a76
update peft and transformers ( #1811 )
2024-08-06 10:32:05 -04:00
Wing Lian
c56e0a79a5
logging improvements ( #1808 ) [skip ci]
...
* logging improvements
* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78
set z3 leaf for deepseek v2 ( #1809 ) [skip ci]
...
* set z3 leaf for deepseek v2
* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0
remove un-necessary zero-first guard as it's already only called in a parent fn ( #1810 ) [skip ci]
2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de
One cycle lr ( #1803 )
...
* refactor one_cycle lr scheduler so it's reusable in more situations
* fix validation for lr_scheduler
* default to cosine anneal strategy
* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
Ben Feuer
b7665c26c8
Update conversation.qmd ( #1788 ) [skip ci]
2024-08-05 12:44:26 -04:00
Aaditya Ura (looking for PhD Fall’24)
cb023c70db
Update instruct-lora-8b.yml ( #1789 ) [skip ci]
...
Config is giving an error if not using the end of the token as the `pad_to_sequence_len` is true.
2024-08-05 12:43:20 -04:00
ripes
7402eb9dcb
Fix setting correct repo id when pushing dataset to hub ( #1657 )
...
* use the ds hash as the dataset's config_name
* improve logging for loading/pushing ds to hub
* fix missing f string
2024-08-05 12:42:15 -04:00
Sri Kainkaryam
203816f7b4
Fix colab example notebook ( #1805 ) [skip ci]
2024-08-04 13:24:26 -04:00
Wing Lian
78b42a3fe1
fix roles to train defaults and make logging less verbose ( #1801 )
2024-07-30 20:58:17 -04:00
Wing Lian
3ebf22464b
qlora-fsdp ram efficient loading with hf trainer ( #1791 )
...
* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
2024-07-30 19:21:38 -04:00
Wing Lian
dbf8fb549e
publish axolotl images without extras in the tag name ( #1798 )
2024-07-30 13:36:19 -04:00
Wing Lian
9a63884597
update test and main/nightly builds ( #1797 )
...
* update test and main/nightly builds
* don't install mamba-ssm on 2.4.0 since it has no wheels yet
2024-07-30 12:37:40 -04:00
Wing Lian
c5587b45ac
use 12.4.1 instead of 12.4 [skip-ci] ( #1796 )
2024-07-30 08:50:23 -04:00
Wing Lian
d4f6a6b103
fix dockerfile and base builder ( #1795 ) [skip-ci]
2024-07-30 08:34:37 -04:00
Wing Lian
d8d1788ffc
move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 ( #1793 )
2024-07-30 08:06:11 -04:00
mhenrichsen
3bc8e64557
Update README.md ( #1792 )
2024-07-30 07:59:53 +02:00