axolotl

Author	SHA1	Message	Date
Keith Stevens	7b9f669a3a	Trigger the original tokenization behavior when no advanced turn settings are provided (#1915 )	2024-09-14 08:22:54 -04:00
Wing Lian	5c42f11411	remove dynamic module loader monkeypatch as this was fixed upstream (#1914 )	2024-09-13 22:19:54 -04:00
Wing Lian	3853ab7ae9	bump accelerate to 0.34.2 (#1901 ) * bump accelerate * add fixture to predownload the test model * change fixture	2024-09-07 14:39:31 -04:00
Wing Lian	6e354682e3	fix zero3 integration (#1897 ) * fix zero3 integration * bump transformers and accelerate too	2024-09-05 10:58:50 -04:00
Alpay Ariyak	ab461d83c4	Fix documentation for pre-tokenized dataset (#1894 ) It's currently asking to not add BOS and EOS, stating that Axolotl adds them, but this is not true	2024-09-05 23:11:31 +09:00
Wing Lian	93b769a979	lint fix and update gha regex (#1899 )	2024-09-05 09:58:21 -04:00
Tijmen de Haan	f18f4268b5	Docs for AMD-based HPC systems (#1891 ) * Add documentation for installing on AMD-based HPC systems. * Accept suggestion to add note about deepspeed Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update _quarto.yml with amd_hpc doc --------- Co-authored-by: Tijmen de Haan <tijmen.dehaan@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-09-05 18:33:19 +09:00
Wing Lian	dca1fe47d4	fix optimizer + fsdp combination in example (#1893 )	2024-09-04 11:28:47 -04:00
Wing Lian	4e5400c732	support for auto_find_batch_size when packing (#1885 ) * support for auto_find_batch_size when packing * make sure to return data from validation * make sure to return data from validation * actually expose multipack_real_batches in the config * calculate gathered efficiency in sampler * tweak to fix auto find and use actual sampler len for multipack * uncomment * use args for bsz when not available from auto find	2024-09-03 20:02:44 -04:00
Wing Lian	0aeb277456	add e2e smoke tests for llama liger integration (#1884 ) * add e2e smoke tests for llama liger integration * fix import * don't use __main__ for test * consolidate line	2024-09-01 19:29:37 -04:00
Chiwan Park	bdab3ec587	Fix RMSNorm monkey patch for Gemma models (#1886 )	2024-09-01 18:34:24 -04:00
Wing Lian	3c6b9eda2e	run pytests with varied pytorch versions too (#1883 )	2024-08-31 22:49:35 -04:00
DocShotgun	15408d0f09	Update supported models for Liger Kernel (#1875 ) * Update supported models for Liger Kernel Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE * move import to their appropriate conditions * Integrate Phi3 LCE support https://github.com/linkedin/Liger-Kernel/pull/103/ --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-08-31 21:59:48 -04:00
Wing Lian	ce33e1ed83	pin liger-kernel to latest 0.2.1 (#1882 ) [skip ci]	2024-08-30 17:51:18 -04:00
Byron Hsu	e3a38450de	Add liger kernel to features (#1881 ) [skip ci]	2024-08-29 08:19:18 -04:00
Aman Gupta Karmani	7037e3c836	deepseekv2 liger support (#1878 ) * deepseekv2 liger support * add comment * add missing impl	2024-08-27 23:52:40 -04:00
Aman Gupta Karmani	c1a61ae23c	fix liger plugin load issues (#1876 )	2024-08-27 23:08:26 -04:00
Aman Gupta Karmani	159b8b9a74	monkey-patch transformers to simplify monkey-patching modeling code (#1877 ) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment	2024-08-27 17:22:26 -07:00
Wing Lian	1e43660701	Sample pack trust remote code v2 (#1873 ) * fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp	2024-08-27 13:39:24 -04:00
Chiwan Park	f6362d2a05	Add Liger Kernal support for Qwen2 (#1871 )	2024-08-27 13:03:16 -04:00
Wing Lian	17af1d7081	clear cuda cache to help with memory leak/creep (#1858 ) * clear cuda cache to help with memory leak/creep * reverse order of gc	2024-08-26 15:50:26 -04:00
Chiwan Park	2dac1edf72	Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` (#1867 )	2024-08-26 12:56:12 -04:00
Wing Lian	6819c12cee	update specturm authors (#1869 )	2024-08-26 12:00:36 -04:00
Wing Lian	8e29bdefdd	Spectrum plugin (#1866 )	2024-08-25 17:54:02 -04:00
Wing Lian	f245964f22	better handling of llama-3 tool rolw (#1782 )	2024-08-25 12:31:40 -04:00
Wing Lian	22f4eafa55	simplify logic (#1856 )	2024-08-23 20:23:08 -04:00
Wing Lian	77a4b9cda2	change up import to prevent AttributeError (#1863 ) * change up import to prevent AttributeError * tweak patching check for updated upstream	2024-08-23 17:00:01 -04:00
Wing Lian	810ecd4e81	add liger to readme (#1865 ) * add liger to readme * updates from PR feedback	2024-08-23 14:34:03 -04:00
Wing Lian	da0d581a8c	add liger example (#1864 )	2024-08-23 12:37:50 -04:00
Wing Lian	1f686c576c	Liger Kernel integration (#1861 ) * add initial plugin support w Liger kernel patches * integrate the input args classes * fix liger plugin and dynamic configuration class * drop untrainable samples and refactor config plugins integration * fix incorrect inputs and circular imports * fix bool comparison * fix for dropping untraibable tokens * fix licensing so liger integration is Apache 2.0 * add jamba support * pylint ignore	2024-08-23 12:21:51 -04:00
Wing Lian	e8ff5d5738	don't mess with bnb since it needs compiled wheels (#1859 )	2024-08-23 12:18:47 -04:00
Wing Lian	328fd4b3b7	add axolotl community license (#1862 )	2024-08-23 11:40:21 -04:00
Wing Lian	fefa95e350	most model types now support flash attention 2 regardless of multipack support (#1854 )	2024-08-22 16:39:23 -04:00
Wing Lian	b33dc07a77	rename nightly test and add badge (#1853 )	2024-08-22 13:13:33 -04:00
Wing Lian	dcbff16983	run nightly ci builds against upstream main (#1851 ) * run nightly ci builds against upstream main * add test badges * run the multigpu tests against nightly main builds too	2024-08-22 13:10:54 -04:00
Wing Lian	2f8037fee6	ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850 ) [skip ci]	2024-08-22 13:10:40 -04:00
Aman Gupta Karmani	de4ea2d1f2	docs: minor syntax highlight fix (#1839 )	2024-08-22 11:47:34 -04:00
JohanWork	7ed92e61c2	fix: prompt phi (#1845 ) [skip ci] * corecting phi system prompt * phi test * update * add test	2024-08-22 11:46:57 -04:00
Wing Lian	9caa3eb699	make the train_on_eos default to turn so all eos tokens are treated the same (#1847 ) [skip ci]	2024-08-22 11:45:37 -04:00
Wing Lian	5b0b774e38	ensure that the bias is also in the correct dtype (#1848 ) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp	2024-08-22 11:45:00 -04:00
Wing Lian	c3fc529bfc	numpy 2.1.0 was released, but incompatible with numba (#1849 ) [skip ci]	2024-08-22 11:44:45 -04:00
Gal Cohen (galco)	957c956f89	rename jamba example (#1846 ) [skip ci] * rename jamba example * feat: change readme --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-22 09:22:55 -04:00
Aman Gupta Karmani	f07802f9fa	examples: fix tiny-llama pretrain yml syntax (#1840 )	2024-08-21 13:37:51 -04:00
Gal Cohen (galco)	9f917245f6	feat: add jamba chat_template (#1843 ) * feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-21 13:37:17 -04:00
Aman Gupta Karmani	649c19aba3	pretrain: fix with sample_packing=false (#1841 )	2024-08-21 13:36:51 -04:00
Gal Cohen (galco)	5aac4bc284	fix: dont change quant storage dtype in case of fsdp (#1837 ) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 12:41:48 -04:00
Wing Lian	e29931259b	optionally save the final FSDP model as a sharded state dict (#1828 ) * efficiently save very large llms when using FSDP * fix parsing and index of sharded chunks * only save fsdp on main process * debugging for rename * save sharded state dict * remove unused new param * get state dict directly * tweak acc merge fsdp to shard the weight files * sharded_state_dict alongside save_safetensors seems to hang on checkpoint save	2024-08-19 14:59:24 -04:00
Wing Lian	b1d2921222	add validation to prevent 8bit lora finetuning on H100s (#1827 )	2024-08-16 21:32:00 -04:00
Wing Lian	803fed3e90	update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821 ) * update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model * There is already a condition check within the function. This outer one is not necessary Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-08-16 10:41:51 -04:00
NanoCode012	68a3c7678a	fix: parse model_kwargs (#1825 )	2024-08-16 07:51:19 -04:00

1 2 3 4 5 ...

1601 Commits