axolotl

Author	SHA1	Message	Date
Wing Lian	ce33e1ed83	pin liger-kernel to latest 0.2.1 (#1882 ) [skip ci]	2024-08-30 17:51:18 -04:00
Byron Hsu	e3a38450de	Add liger kernel to features (#1881 ) [skip ci]	2024-08-29 08:19:18 -04:00
Aman Gupta Karmani	7037e3c836	deepseekv2 liger support (#1878 ) * deepseekv2 liger support * add comment * add missing impl	2024-08-27 23:52:40 -04:00
Aman Gupta Karmani	c1a61ae23c	fix liger plugin load issues (#1876 )	2024-08-27 23:08:26 -04:00
Aman Gupta Karmani	159b8b9a74	monkey-patch transformers to simplify monkey-patching modeling code (#1877 ) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment	2024-08-27 17:22:26 -07:00
Wing Lian	1e43660701	Sample pack trust remote code v2 (#1873 ) * fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp	2024-08-27 13:39:24 -04:00
Chiwan Park	f6362d2a05	Add Liger Kernal support for Qwen2 (#1871 )	2024-08-27 13:03:16 -04:00
Wing Lian	17af1d7081	clear cuda cache to help with memory leak/creep (#1858 ) * clear cuda cache to help with memory leak/creep * reverse order of gc	2024-08-26 15:50:26 -04:00
Chiwan Park	2dac1edf72	Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` (#1867 )	2024-08-26 12:56:12 -04:00
Wing Lian	6819c12cee	update specturm authors (#1869 )	2024-08-26 12:00:36 -04:00
Wing Lian	8e29bdefdd	Spectrum plugin (#1866 )	2024-08-25 17:54:02 -04:00
Wing Lian	f245964f22	better handling of llama-3 tool rolw (#1782 )	2024-08-25 12:31:40 -04:00
Wing Lian	22f4eafa55	simplify logic (#1856 )	2024-08-23 20:23:08 -04:00
Wing Lian	77a4b9cda2	change up import to prevent AttributeError (#1863 ) * change up import to prevent AttributeError * tweak patching check for updated upstream	2024-08-23 17:00:01 -04:00
Wing Lian	810ecd4e81	add liger to readme (#1865 ) * add liger to readme * updates from PR feedback	2024-08-23 14:34:03 -04:00
Wing Lian	da0d581a8c	add liger example (#1864 )	2024-08-23 12:37:50 -04:00
Wing Lian	1f686c576c	Liger Kernel integration (#1861 ) * add initial plugin support w Liger kernel patches * integrate the input args classes * fix liger plugin and dynamic configuration class * drop untrainable samples and refactor config plugins integration * fix incorrect inputs and circular imports * fix bool comparison * fix for dropping untraibable tokens * fix licensing so liger integration is Apache 2.0 * add jamba support * pylint ignore	2024-08-23 12:21:51 -04:00
Wing Lian	e8ff5d5738	don't mess with bnb since it needs compiled wheels (#1859 )	2024-08-23 12:18:47 -04:00
Wing Lian	328fd4b3b7	add axolotl community license (#1862 )	2024-08-23 11:40:21 -04:00
Wing Lian	fefa95e350	most model types now support flash attention 2 regardless of multipack support (#1854 )	2024-08-22 16:39:23 -04:00
Wing Lian	b33dc07a77	rename nightly test and add badge (#1853 )	2024-08-22 13:13:33 -04:00
Wing Lian	dcbff16983	run nightly ci builds against upstream main (#1851 ) * run nightly ci builds against upstream main * add test badges * run the multigpu tests against nightly main builds too	2024-08-22 13:10:54 -04:00
Wing Lian	2f8037fee6	ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850 ) [skip ci]	2024-08-22 13:10:40 -04:00
Aman Gupta Karmani	de4ea2d1f2	docs: minor syntax highlight fix (#1839 )	2024-08-22 11:47:34 -04:00
JohanWork	7ed92e61c2	fix: prompt phi (#1845 ) [skip ci] * corecting phi system prompt * phi test * update * add test	2024-08-22 11:46:57 -04:00
Wing Lian	9caa3eb699	make the train_on_eos default to turn so all eos tokens are treated the same (#1847 ) [skip ci]	2024-08-22 11:45:37 -04:00
Wing Lian	5b0b774e38	ensure that the bias is also in the correct dtype (#1848 ) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp	2024-08-22 11:45:00 -04:00
Wing Lian	c3fc529bfc	numpy 2.1.0 was released, but incompatible with numba (#1849 ) [skip ci]	2024-08-22 11:44:45 -04:00
Gal Cohen (galco)	957c956f89	rename jamba example (#1846 ) [skip ci] * rename jamba example * feat: change readme --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 09:22:55 -04:00
Aman Gupta Karmani	f07802f9fa	examples: fix tiny-llama pretrain yml syntax (#1840 )	2024-08-21 13:37:51 -04:00
Gal Cohen (galco)	9f917245f6	feat: add jamba chat_template (#1843 ) * feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-21 13:37:17 -04:00
Aman Gupta Karmani	649c19aba3	pretrain: fix with sample_packing=false (#1841 )	2024-08-21 13:36:51 -04:00
Gal Cohen (galco)	5aac4bc284	fix: dont change quant storage dtype in case of fsdp (#1837 ) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 12:41:48 -04:00
Wing Lian	e29931259b	optionally save the final FSDP model as a sharded state dict (#1828 ) * efficiently save very large llms when using FSDP * fix parsing and index of sharded chunks * only save fsdp on main process * debugging for rename * save sharded state dict * remove unused new param * get state dict directly * tweak acc merge fsdp to shard the weight files * sharded_state_dict alongside save_safetensors seems to hang on checkpoint save	2024-08-19 14:59:24 -04:00
Wing Lian	b1d2921222	add validation to prevent 8bit lora finetuning on H100s (#1827 )	2024-08-16 21:32:00 -04:00
Wing Lian	803fed3e90	update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821 ) * update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model * There is already a condition check within the function. This outer one is not necessary Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-08-16 10:41:51 -04:00
NanoCode012	68a3c7678a	fix: parse model_kwargs (#1825 )	2024-08-16 07:51:19 -04:00
NanoCode012	f18925fb4b	fix: parse eager_attention (#1824 )	2024-08-14 09:46:46 -04:00
Wing Lian	1853d6021d	bump hf dependencies (#1823 ) * bump hf dependencies * revert optimum version change * don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that	2024-08-11 16:27:41 -04:00
Chiwan Park	0801f239cc	fix the incorrect `max_length` for chat template (#1818 )	2024-08-09 11:50:31 -04:00
Wing Lian	54392ac8a6	Attempt to run multigpu in PR CI for now to ensure it works (#1815 ) [skip ci] * Attempt to run multigpu in PR CI for now to ensure it works * fix yaml file * forgot to include multigpu tests * fix call to cicd.multigpu * dump dictdefault to dict for yaml conversion * use to_dict instead of casting * 16bit-lora w flash attention, 8bit lora seems problematic * add llama fsdp test * more tests * Add test for qlora + fsdp with prequant * limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test * move multigpu tests to biweekly	2024-08-09 11:50:13 -04:00
Wing Lian	3e2b269d06	update tinyllama to use final instead of checkpoints (#1820 ) [skip ci]	2024-08-09 10:58:19 -04:00
Wing Lian	5ee4b7325f	fix z3 leaf configuration when not using lists (#1817 ) [skip ci]	2024-08-09 10:54:52 -04:00
Wing Lian	70978467a0	skip no commit to main on ci (#1814 )	2024-08-06 15:25:54 -04:00
Wing Lian	850f999a76	update peft and transformers (#1811 )	2024-08-06 10:32:05 -04:00
Wing Lian	c56e0a79a5	logging improvements (#1808 ) [skip ci] * logging improvements * fix sort	2024-08-06 10:31:50 -04:00
Wing Lian	35d5e59d78	set z3 leaf for deepseek v2 (#1809 ) [skip ci] * set z3 leaf for deepseek v2 * add deepseek v2 chat template	2024-08-06 09:30:46 -04:00
Wing Lian	fbbeb4fee0	remove un-necessary zero-first guard as it's already only called in a parent fn (#1810 ) [skip ci]	2024-08-06 09:29:23 -04:00
Wing Lian	ecdda006de	One cycle lr (#1803 ) * refactor one_cycle lr scheduler so it's reusable in more situations * fix validation for lr_scheduler * default to cosine anneal strategy * one cycle lr exepects cos	2024-08-05 13:12:05 -04:00
Ben Feuer	b7665c26c8	Update conversation.qmd (#1788 ) [skip ci]	2024-08-05 12:44:26 -04:00

1 2 3 4 5 ...

1588 Commits