axolotl

Author	SHA1	Message	Date
Wing Lian	105c65390e	add q-galore optimizer	2024-07-14 19:28:13 -04:00
Wing Lian	78e12f8ca5	add basic support for the optimi adamw optimizer (#1727 ) * add support for optimi_adamw optimizer w kahan summation * pydantic validator for optimi_adamw * workaround for setting optimizer for fsdp * make sure to install optimizer packages * make sure to have parity for model parameters passed to optimizer * add smoke test for optimi_adamw optimizer * don't use foreach optimi by default	2024-07-14 19:12:57 -04:00
Wing Lian	98af5388ba	bump flash attention 2.5.8 -> 2.6.1 (#1738 ) * bump flash attention 2.5.8 -> 2.6.1 * use triton implementation of cross entropy from flash attn * add smoke test for flash attn cross entropy patch * fix args to xentropy.apply * handle tuple from triton loss fn * ensure the patch tests run independently * use the wrapper already built into flash attn for cross entropy * mark pytest as forked for patches * use pytest xdist instead of forked, since cuda doesn't like forking * limit to 1 process and use dist loadfile for pytest * change up pytest for fixture to reload transformers w monkeypathc	2024-07-14 19:11:31 -04:00
RodriMora	219cd0d3c5	Fix eval_sample_packing in llama-3 lora example (#1716 ) [skip ci] * Fix eval_sample_packing in llama-3 lora example * Update examples/llama-3/lora-8b.yml Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-07-13 14:34:44 -04:00
David Meikle	634f384e06	Changed URL for dataset docs (#1744 )	2024-07-13 14:34:28 -04:00
Akshaya Shanbhogue	4512738a73	bump xformers to 0.0.27 (#1740 ) * Update requirements.txt Preserve compatibility with torch 2.3.1. [Reference](https://github.com/facebookresearch/xformers/issues/1052) * fix setup.py to extract the current xformers dep from requirements for replacement * xformers 0.0.27 wheels not built for torch 2.3.0 --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-07-13 14:04:31 -04:00
Wing Lian	1e57b4c562	update to pytorch 2.3.1 (#1746 ) [skip ci]	2024-07-13 13:28:17 -04:00
Wing Lian	a4a5bf057f	fixes to prevent vram spike when train starts (#1742 )	2024-07-13 09:53:13 -04:00
Wing Lian	137d84d1b4	add torch 2.3.1 base image (#1745 )	2024-07-13 09:41:51 -04:00
Oliver Klingefjord	18abdb447a	typo (#1685 ) [skip ci] * typo * typo 2 --------- Co-authored-by: mhenrichsen <mads.gade.henrichsen@live.dk>	2024-07-12 21:24:01 -04:00
Wing Lian	47e1916484	add tests so CI can catch updates where patches will break with unsloth (#1737 ) [skip ci]	2024-07-11 16:43:19 -04:00
mhenrichsen	1194c2e0b1	github urls (#1734 ) Co-authored-by: Henrichsen, Mads (ext) <mads.henrichsen.ext@siemens-energy.com>	2024-07-11 09:19:29 -04:00
Wing Lian	a159724e44	bump trl and accelerate for latest releases (#1730 ) * bump trl and accelerate for latest releases * ensure that the CI runs on new gh org * drop kto_pair support since removed upstream	2024-07-10 11:15:44 -04:00
Josh Bleecher Snyder	b3f680d305	sanity check ranges in freeze.py (#1686 ) * sanity check ranges in freeze.py this will catch problems earlier and more clearly. in my case, it appears that deepspeed zero3 sets layer tensor shapes to [0], which doesn't play well with automatically inferred ranges. through a bit of luck, inverting ranges still appears to work correctly. * simplify chained comparison	2024-07-05 09:24:07 -04:00
Wing Lian	c69b7eb2b5	full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading, disabling for now (#1726 )	2024-07-05 09:15:36 -04:00
Wing Lian	c6d83a87c4	add support for .env files for env vars (#1724 )	2024-07-02 13:17:40 -04:00
Wing Lian	5370cedf0c	support for gemma2 w sample packing (#1718 )	2024-06-29 01:38:55 -04:00
Josh Bleecher Snyder	f2480a1d91	improve Pre-Tokenized Dataset docs (#1684 ) [skip ci] Fixes #1661	2024-06-26 13:13:21 -07:00
DavidFarago	559562d790	Allow "weight: 0" in messages to mask them (#1703 ) Allow in message objects the additional key `weight`, which can be set to 0 (or 1) to cause that message to be masked out (or left unmasked) for training (similar to [1]). This is helpful for training the model to be robust and capable of error recovery upon a bad assistant message. A missing `weight` key defaults to weight 1, to guarantee downward compatibility. [1]: https://github.com/mistralai/mistral-finetune	2024-06-20 10:05:16 -04:00
Wing Lian	4de4b4089f	add support for multipack for deepseek_v2 (#1712 )	2024-06-20 10:02:55 -04:00
Wing Lian	3f1f5e3312	drop length column for issues with eval without packing (#1711 )	2024-06-18 23:32:29 -04:00
Wing Lian	5783839c6e	download model weights on preprocess step (#1693 )	2024-06-09 20:10:17 -04:00
Wing Lian	cbbf039a46	verbose failure message (#1694 )	2024-06-09 20:09:36 -04:00
Wing Lian	851ccb1237	bump deepspeed for fix for grad norm compute putting tensors on different devices (#1699 )	2024-06-09 17:13:28 -04:00
Wing Lian	18cabc0c46	fix for when sample_packing and eval_sample_packing are different (#1695 )	2024-06-08 09:48:30 -04:00
Wing Lian	ed8ef65371	add back packing efficiency estimate so epochs and multi-gpu works properly (#1697 )	2024-06-08 09:48:10 -04:00
Wing Lian	00ac3022a1	add qwen2-72b fsdp example (#1696 )	2024-06-07 16:38:29 -04:00
Wing Lian	9c1af1a9c0	ensure explicit eval_sample_packing to avoid mismatch issues (#1692 )	2024-06-07 11:28:43 -04:00
Aaditya Ura (looking for PhD Fall’24)	a82a711522	Create phi3-ft-fsdp.yml (#1580 ) rename to be fsdp specific and tweak settings a bit	2024-06-04 16:20:25 -04:00
Brian Fitzgerald	cf64284a04	Phi-3 conversation format, example training script and perplexity metric (#1582 ) * phi-3 support and perplexity metric * phi-3 chat template * metrics updates * chore: lint * fix assertion on Tensor * fix tests since tokenization happens in the metric * fix perplexity value of shorter passage --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-06-04 16:11:56 -04:00
Wing Lian	c996881ec2	add support for rpo_alpha (#1681 ) * add support for rpo_alpha * Add smoke test for dpo + nll loss	2024-06-04 16:09:51 -04:00
Wing Lian	1f151c0d52	re-enable DPO for tests in modal ci (#1374 ) * re-enable DPO for tests in modal ci * workaround for training args * don't mixin AxolotlTrainingArguments * fix mixin order so MRO doesn't result in TypeError: non-default argument follows default argument error * use smaller datasets for dpo tests	2024-06-03 12:50:44 -04:00
Saeed Esmaili	5cde06587a	Fix the broken link in README (#1678 ) [skip ci]	2024-06-03 09:38:44 -04:00
Wing Lian	05b0bd08d2	need to add back drop_last for sampler (#1676 )	2024-05-31 13:13:13 -04:00
Wing Lian	d4f6c65e4c	cleanup the deepspeed proxy model at the end of training (#1675 )	2024-05-30 13:40:35 -04:00
Wing Lian	a944f7b32b	load explicit splits on datasets (#1652 )	2024-05-29 22:27:59 -04:00
Wing Lian	9d4225a058	set chat_template in datasets config automatically (#1664 ) * set chat_template in datasets config automatically * dynamic chat_template, not jsut chatml	2024-05-29 22:27:26 -04:00
Wing Lian	f7332ac449	use mixins for orpo and kto configs so they work with axolotl customizations (#1674 )	2024-05-29 22:27:00 -04:00
Wing Lian	16d46b74e4	re-enable phi for tests in modal ci (#1373 )	2024-05-29 15:41:46 -04:00
Wing Lian	a6b37bdeb4	revert multipack batch sampler changes (#1672 ) * revert multipack batch sampler changes * fix default val for drop_last	2024-05-29 11:51:18 -04:00
Wing Lian	b7520801a3	handle the system role too for chat templates (#1671 )	2024-05-29 10:21:11 -04:00
Wing Lian	fe650dd326	make sure the CI fails when pytest script fails (#1669 ) * make sure the pytest script fails * make sure the defaults come through for tests * make sure tensorboard is loaded for test assertion	2024-05-29 10:12:11 -04:00
Abe Voelker	49b967b62f	Fix README quick start example usage model dirs (#1668 )	2024-05-28 18:10:40 -04:00
Seungduk Kim	65db903714	Correct name of MixtralBlockSparseTop2MLP (L -> l) (#1667 )	2024-05-28 18:10:29 -04:00
Davide Caroselli	6a5a725f10	Fix: ensure correct handling of `val_set_size` as `float` or `int` (#1655 ) * Fix: ensure correct handling of val_set_size as float or int * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-05-28 12:00:32 -04:00
Wing Lian	f5febc729a	fix lint issue that snuck through (#1665 )	2024-05-28 11:36:50 -04:00
Faria Huq	230e0ac363	Fix Lora config error for Llama3 (#1659 ) The current yml code throws an error: ValueError: Please set lora_modules_to_save to [`embed_tokens`, `lm_head`] when using an adapter and changing the special tokens. I added the required changes to resolve it	2024-05-28 11:25:08 -04:00
Keith Stevens	cc11c6bce2	Generalizing the chat_template prompt strategy (#1660 ) [skip ci] The strategy now supports configuring several fields: * The data field holding message arrays * the role and content fields for each message * role mapping from source to target types additionally this adds a sample llama3-8b instruct template using the chat template	2024-05-28 11:24:13 -04:00
Maciek	5f91064040	Fix Google Colab notebook 2024-05 (#1662 ) [skip ci] * include mlflow installation in the colab notebook Without explicitly installing mlflow the `accelerate launch` command fails. * update the colab noteboko to use the latest tinyllama config	2024-05-28 11:23:52 -04:00
Wing Lian	ef223519c9	update deps (#1663 ) [skip ci] * update deps and tweak logic so axolotl is pip installable * use vcs url format * using dependency_links isn't supported per docs)	2024-05-28 11:23:34 -04:00

1 2 3 4 5 ...

1508 Commits