Commit Graph

1577 Commits

Author SHA1 Message Date
Wing Lian
f245964f22 better handling of llama-3 tool rolw (#1782) 2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55 simplify logic (#1856) 2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2 change up import to prevent AttributeError (#1863)
* change up import to prevent AttributeError

* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
810ecd4e81 add liger to readme (#1865)
* add liger to readme

* updates from PR feedback
2024-08-23 14:34:03 -04:00
Wing Lian
da0d581a8c add liger example (#1864) 2024-08-23 12:37:50 -04:00
Wing Lian
1f686c576c Liger Kernel integration (#1861)
* add initial plugin support w Liger kernel patches

* integrate the input args classes

* fix liger plugin and dynamic configuration class

* drop untrainable samples and refactor config plugins integration

* fix incorrect inputs and circular imports

* fix bool comparison

* fix for dropping untraibable tokens

* fix licensing so liger integration is Apache 2.0

* add jamba support

* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
e8ff5d5738 don't mess with bnb since it needs compiled wheels (#1859) 2024-08-23 12:18:47 -04:00
Wing Lian
328fd4b3b7 add axolotl community license (#1862) 2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350 most model types now support flash attention 2 regardless of multipack support (#1854) 2024-08-22 16:39:23 -04:00
Wing Lian
b33dc07a77 rename nightly test and add badge (#1853) 2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983 run nightly ci builds against upstream main (#1851)
* run nightly ci builds against upstream main

* add test badges

* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
2f8037fee6 ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850) [skip ci] 2024-08-22 13:10:40 -04:00
Aman Gupta Karmani
de4ea2d1f2 docs: minor syntax highlight fix (#1839) 2024-08-22 11:47:34 -04:00
JohanWork
7ed92e61c2 fix: prompt phi (#1845) [skip ci]
* corecting phi system prompt

* phi test

* update

* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699 make the train_on_eos default to turn so all eos tokens are treated the same (#1847) [skip ci] 2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38 ensure that the bias is also in the correct dtype (#1848) [skip ci]
* ensure that the bias is also in the correct dtype

* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Wing Lian
c3fc529bfc numpy 2.1.0 was released, but incompatible with numba (#1849) [skip ci] 2024-08-22 11:44:45 -04:00
Gal Cohen (galco)
957c956f89 rename jamba example (#1846) [skip ci]
* rename jamba example

* feat: change readme

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-22 09:22:55 -04:00
Aman Gupta Karmani
f07802f9fa examples: fix tiny-llama pretrain yml syntax (#1840) 2024-08-21 13:37:51 -04:00
Gal Cohen (galco)
9f917245f6 feat: add jamba chat_template (#1843)
* feat: add jamba chat_template

* fix: black

* feat: jamba fsdp+qlora

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3 pretrain: fix with sample_packing=false (#1841) 2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284 fix: dont change quant storage dtype in case of fsdp (#1837)
* fix: dont change quant storage dtype in case of fsdp

* fix black

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b optionally save the final FSDP model as a sharded state dict (#1828)
* efficiently save very large llms when using FSDP

* fix parsing and index of sharded chunks

* only save fsdp on main process

* debugging for rename

* save sharded state dict

* remove unused new param

* get state dict directly

* tweak acc merge fsdp to shard the weight files

* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222 add validation to prevent 8bit lora finetuning on H100s (#1827) 2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90 update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821)
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model

* There is already a condition check within the function. This outer one is not necessary

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a fix: parse model_kwargs (#1825) 2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b fix: parse eager_attention (#1824) 2024-08-14 09:46:46 -04:00
Wing Lian
1853d6021d bump hf dependencies (#1823)
* bump hf dependencies

* revert optimum version change

* don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that
2024-08-11 16:27:41 -04:00
Chiwan Park
0801f239cc fix the incorrect max_length for chat template (#1818) 2024-08-09 11:50:31 -04:00
Wing Lian
54392ac8a6 Attempt to run multigpu in PR CI for now to ensure it works (#1815) [skip ci]
* Attempt to run multigpu in PR CI for now to ensure it works

* fix yaml file

* forgot to include multigpu tests

* fix call to cicd.multigpu

* dump dictdefault to dict for yaml conversion

* use to_dict instead of casting

* 16bit-lora w flash attention, 8bit lora seems problematic

* add llama fsdp test

* more tests

* Add test for qlora + fsdp with prequant

* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test

* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
3e2b269d06 update tinyllama to use final instead of checkpoints (#1820) [skip ci] 2024-08-09 10:58:19 -04:00
Wing Lian
5ee4b7325f fix z3 leaf configuration when not using lists (#1817) [skip ci] 2024-08-09 10:54:52 -04:00
Wing Lian
70978467a0 skip no commit to main on ci (#1814) 2024-08-06 15:25:54 -04:00
Wing Lian
850f999a76 update peft and transformers (#1811) 2024-08-06 10:32:05 -04:00
Wing Lian
c56e0a79a5 logging improvements (#1808) [skip ci]
* logging improvements

* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78 set z3 leaf for deepseek v2 (#1809) [skip ci]
* set z3 leaf for deepseek v2

* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0 remove un-necessary zero-first guard as it's already only called in a parent fn (#1810) [skip ci] 2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de One cycle lr (#1803)
* refactor one_cycle lr scheduler so it's reusable in more situations

* fix validation for lr_scheduler

* default to cosine anneal strategy

* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
Ben Feuer
b7665c26c8 Update conversation.qmd (#1788) [skip ci] 2024-08-05 12:44:26 -04:00
Aaditya Ura (looking for PhD Fall’24)
cb023c70db Update instruct-lora-8b.yml (#1789) [skip ci]
Config is giving an error if not using the end of the token as the `pad_to_sequence_len` is true.
2024-08-05 12:43:20 -04:00
ripes
7402eb9dcb Fix setting correct repo id when pushing dataset to hub (#1657)
* use the ds hash as the dataset's config_name

* improve logging for loading/pushing ds to hub

* fix missing f string
2024-08-05 12:42:15 -04:00
Sri Kainkaryam
203816f7b4 Fix colab example notebook (#1805) [skip ci] 2024-08-04 13:24:26 -04:00
Wing Lian
78b42a3fe1 fix roles to train defaults and make logging less verbose (#1801) 2024-07-30 20:58:17 -04:00
Wing Lian
3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
2024-07-30 19:21:38 -04:00
Wing Lian
dbf8fb549e publish axolotl images without extras in the tag name (#1798) 2024-07-30 13:36:19 -04:00
Wing Lian
9a63884597 update test and main/nightly builds (#1797)
* update test and main/nightly builds

* don't install mamba-ssm on 2.4.0 since it has no wheels yet
2024-07-30 12:37:40 -04:00
Wing Lian
c5587b45ac use 12.4.1 instead of 12.4 [skip-ci] (#1796) 2024-07-30 08:50:23 -04:00
Wing Lian
d4f6a6b103 fix dockerfile and base builder (#1795) [skip-ci] 2024-07-30 08:34:37 -04:00
Wing Lian
d8d1788ffc move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 (#1793) 2024-07-30 08:06:11 -04:00
mhenrichsen
3bc8e64557 Update README.md (#1792) 2024-07-30 07:59:53 +02:00