axolotl

Author	SHA1	Message	Date
Wing Lian	0542c7dd56	add muon optimizer optimizer_cls_and_kwargs is on trainer_kwargs only add adamw_kwargs if they're non-null fix mocks better handling of override and check the optimizer unwrap optimizer	2025-03-05 10:47:22 -05:00
Wing Lian	9850f42204	bump liger to 0.5.3 (#2353 )	2025-02-24 12:40:54 -05:00
salman	29b366b2e1	Bumping 0.15.1 TRL version for GRPO+PEFT fix (#2344 ) * bumping TRL version * apply upstream fixes to our custom fix --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-02-20 22:56:04 -05:00
NanoCode012	b53a41372f	feat: update transformers version to 4.49.0 (#2340 )	2025-02-20 21:12:06 -05:00
Wing Lian	ffae8d6a95	GRPO (#2307 )	2025-02-13 16:01:01 -05:00
NanoCode012	e48e2df4dd	feat: update FA to 2.7.4.post1 which includes torch2.6 binary (#2315 )	2025-02-07 21:34:01 -05:00
Wing Lian	b7616022ab	bump transformers to 4.48.3 (#2318 )	2025-02-07 21:33:44 -05:00
NanoCode012	5bbad5ef93	feat: add torch2.6 to ci (#2311 )	2025-02-07 07:28:54 -05:00
Wing Lian	8779997ba5	native support for modal cloud from CLI (#2237 ) * native support for modal cloud from CLI * do lm_eval in cloud too * Fix the sub call to lm-eval * lm_eval option to not post eval, and append not extend * cache bust when using branch, grab sha of latest image tag, update lm-eval dep * allow minimal yaml for lm eval * include modal in requirements * update link in README to include utm * pr feedback * use chat template * revision support * apply chat template as arg * add wandb name support, allow explicit a100-40gb * cloud is optional * handle accidental setting of tasks with a single task str * document the modal cloud yaml for clarity [skip ci] * cli docs * support spawn vs remote for lm-eval * Add support for additional docker commands in modal image build * cloud config shouldn't be a dir * Update README.md Co-authored-by: Charles Frye <cfrye59@gmail.com> * fix annotation args --------- Co-authored-by: Charles Frye <cfrye59@gmail.com>	2025-01-30 11:34:02 -05:00
Wing Lian	0b52f06227	bump bnb to 0.45.1 (#2289 ) [skip ci]	2025-01-28 23:21:25 -05:00
Wing Lian	8a7a0b07dc	support for latest transformers release 4.48.1 (#2256 )	2025-01-23 21:17:57 -05:00
Wing Lian	fb3352e21c	rename liger test so it properly runs in ci (#2246 )	2025-01-09 17:31:43 -05:00
Wing Lian	7669a03fb4	update upstream HF deps (#2239 ) * bump axolotl contribs for upstream main conflicts: * bump datasets, tokenizer, trl * remove log workarounds in trl * bump lm-eval * remove unsloth_ import from critical path * remove llama fa2 from conftest * unsloth breaks with latest upstream	2025-01-09 21:01:59 +00:00
Wing Lian	e0a2eb2ebd	fix untrained tokens if specified explicitly from a list (#2210 )	2024-12-23 09:08:28 -05:00
Wing Lian	d91feaffc8	upgrade to liger 0.5.2 (#2181 ) [skip ci]	2024-12-17 13:58:21 -05:00
Wing Lian	e246ceffa4	use axolotl contribs for fix_untrained_tokens (#2194 ) [skip ci] * use axolotl contribs for fix_untrained_tokens * remove the module we're replacing * Add check for using fix_untrained_tokens	2024-12-17 13:57:16 -05:00
Wing Lian	1f623e6cc8	transformers 4.47.1 (#2187 ) * transformers 4.47.1 * drop monkeypatches * can't remove patches yet * make flash attention forward ignore the loss kwargs * patch the flash attention in the modeling arch too * remove fsdp and deepspeed patches * cleanup PR * bump accelerate and torchao, also logically reorder/group requirements * meant to include torchao * use official patch release	2024-12-17 11:01:21 -05:00
Wing Lian	effc4dc409	pin to 4.47.0 (#2180 )	2024-12-12 20:17:12 -05:00
Wing Lian	40907c6887	upgrade deepspeed to 0.16.1 (#2157 )	2024-12-09 07:25:10 -05:00
Wing Lian	1302e31049	Transformers version flexibility and FSDP optimizer patch (#2155 ) * allow flexibility in transformers version for FSDP * more flexibility with dev versions of 4.47.0.dev0 * add patch for fsdp * fix typo * correct fn name * stray character * fix patch * reset Trainer too * also reset Trainer.training_step * allow tests/patched to run more than one process on e2e runner * skip tests/patched in e2e for now since it's run in regular pytest	2024-12-08 14:50:40 -05:00
Wing Lian	be5f554a62	bump autoawq to 0.2.7.post3 (#2150 )	2024-12-07 22:24:09 -05:00
Wing Lian	743ba62bd5	Transformers 4.47.0 (#2138 ) * bump transformers and trl * fix: update trainer.log signature * fix trl trainer.log interfaces * broken 🦥 with latest transformers * skip parent, call grandparent - yeah, super janky * update HF HUB env var and fix reward trainer log since it doesn't directly override log * also bump accelerate * patches for llama ga * detab the code to check * fix whitespace for patch check * play nicely with CI tests since we patch everytime * fix pop default in case it doesn't exist * more tweaks to make patches nicer in CI * fix detab for when there are possibly multiple patches --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-12-07 05:03:01 -05:00
Wing Lian	6b3058b2dc	upgrade bnb 0.45.0 and peft 0.14.0 (#2126 ) * upgrade bnb to lastest release * update peft to working supporting commit * bump to latest release of peft==0.14.0	2024-12-06 09:08:55 -05:00
Wing Lian	ce5bcff750	various tests fixes for flakey tests (#2110 ) * add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests * log slowest tests * pin pynvml==11.5.3 * fix load local hub path * optimize for speed w smaller models and val_set_size * replace pynvml * make the resume from checkpoint e2e faster * make tests smaller	2024-12-02 17:28:58 -05:00
Sunny Liu	bf416bdfd0	bump_liger_0.4.2 (#2096 )	2024-11-21 13:24:52 -05:00
Wing Lian	d9b71edf84	bump transformers for fsdp-grad-accum fix, remove patch (#2079 )	2024-11-19 02:23:09 -05:00
Wing Lian	70cf79ef52	upgrade autoawq==0.2.7.post2 for transformers fix (#2070 ) * point to upstream autoawq for transformers fix * use autoawq 0.2.7 release * test wheel for awq * try different format for wheel def * autoawq re-release * Add intel_extension_for_pytorch dep * ipex gte version * forcefully remove intel-extension-for-pytorch * add -y option to pip uninstall for ipex * use post2 release for autoawq and remove uninstall of ipex	2024-11-18 11:53:37 -05:00
Wing Lian	d42f202046	Fsdp grad accum monkeypatch (#2064 )	2024-11-15 19:11:04 -05:00
Wing Lian	0dabde1962	support for schedule free and e2e ci smoke test (#2066 ) [skip ci] * support for schedule free and e2e ci smoke test * set default lr scheduler to constant in test * ignore duplicate code * fix quotes for config/dict	2024-11-15 19:10:14 -05:00
Wing Lian	2f20cb7ebf	upgrade datasets==3.1.0 and add upstream check (#2067 ) [skip ci]	2024-11-15 19:08:38 -05:00
Wing Lian	2d7830fda6	upgrade to flash-attn 2.7.0 (#2048 )	2024-11-14 06:59:25 -05:00
NanoCode012	4e1891b12b	feat: upgrade to liger 0.4.1 (#2045 )	2024-11-13 10:07:24 -05:00
Wing Lian	fd3b80716a	remove fastchat and sharegpt (#2021 ) * remove fastchat and sharegpt * remove imports * remove more fastchat imports * chore: remove unused functions * feat: remove sharegpt and deprecate from docs * chore: remove unused sharegpt checks * fix: remove sharegpt type from tests * feat: add sharegpt deprecation error * feat: update readme --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-11-08 13:45:49 -05:00
Sunny Liu	3265b7095e	Add weighted optimisation support for trl DPO trainer integration (#2016 ) * trlv0.12.0 integration * update trl version requirements * linting * commenting out * trl version requirement	2024-11-08 11:29:11 -05:00
Wing Lian	02ce520b7e	upgrade liger to 0.4.0 (#1973 ) * upgrade liger to 0.3.1 * update docs and example * skip duplicate code check * Update src/axolotl/integrations/liger/args.py Co-authored-by: NanoCode012 <nano@axolotl.ai> * Update README.md Co-authored-by: NanoCode012 <nano@axolotl.ai> * add logging * chore: lint * add test case * upgrade liger and transformers * also upgrade accelerate * use kwargs to support patch release * make sure prepared path is empty for test * use transfromers 4.46.1 since 4.46.2 breaks fsdp --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-11-07 12:53:34 -05:00
Wing Lian	d3c45d27b5	fix zero3 (#1994 )	2024-10-28 07:32:49 -04:00
NanoCode012	2501c1a6a3	Fix: Gradient Accumulation issue (#1980 ) * feat: support new arg num_items_in_batch * use kwargs to manage extra unknown kwargs for now * upgrade against upstream transformers main * make sure trl is on latest too * fix for upgraded trl * fix: handle trl and transformer signature change * feat: update trl to handle transformer signature * RewardDataCollatorWithPadding no longer has max_length * handle updated signature for tokenizer vs processor class * invert logic for tokenizer vs processor class * processing_class, not processor class * also handle processing class in dpo * handle model name w model card creation * upgrade transformers and add a loss check test * fix install of tbparse requirements * make sure to add tbparse to req * feat: revert kwarg to positional kwarg to be explicit --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-25 11:28:23 -04:00
Wing Lian	955cca41fc	don't explicitly set cpu pytorch version (#1986 ) use a constraint file use min version of xformers don't install autoawq with pytorch 2.5.0 debugging for errors upgrade pip first fix action yml add back try/except retry w/o constraint use --no-build-isolation show torch version install setuptools and wheel add back try/except	2024-10-21 19:50:50 -04:00
Wing Lian	335027f155	upgrade accelerate to 1.0.1 (#1969 )	2024-10-13 20:04:30 -04:00
Wing Lian	ec4272c3a0	add ds zero3 to multigpu biweekly tests (#1900 ) * add ds zero3 to multigpu biweekly tests * fix for upstream api change * use updated accelerate and fix deepspeed tests * stringify the Path, and run multigpu tests if the multigpu tests change for a PR * use correct json rather than yaml * revert accelerate for deepspeed	2024-10-13 17:34:37 -04:00
Wing Lian	d20b48a61e	only install torchao for torch versions >= 2.4.0 (#1963 )	2024-10-12 20:53:48 -04:00
Wing Lian	09bf1ceacc	update hf deps (#1964 ) * update hf deps * remove deprecated set_caching_enabled	2024-10-12 18:19:48 -04:00
Wing Lian	8159cbd1ab	lm_eval harness post train (#1926 ) * wip, lm_eval harness post train * include latex parser * add dtype and doc * add validation when doing bench evals * automatically add test dataset when doing benches	2024-10-10 15:04:17 -04:00
Wing Lian	e8d3da0081	upgrade pytorch from 2.4.0 => 2.4.1 (#1950 ) * upgrade pytorch from 2.4.0 => 2.4.1 * update xformers for updated pytorch version * handle xformers version case for torch==2.3.1	2024-10-09 11:53:56 -04:00
Wing Lian	844331005c	bump transformers to 4.45.1 (#1936 )	2024-09-30 13:56:12 -04:00
Wing Lian	b98d7d7098	update upstream deps versions and replace lora+ (#1928 ) * update upstream deps versions and replace lora+ * typo transformers version	2024-09-26 11:33:41 -04:00
Wing Lian	5c42f11411	remove dynamic module loader monkeypatch as this was fixed upstream (#1914 )	2024-09-13 22:19:54 -04:00
Wing Lian	3853ab7ae9	bump accelerate to 0.34.2 (#1901 ) * bump accelerate * add fixture to predownload the test model * change fixture	2024-09-07 14:39:31 -04:00
Wing Lian	6e354682e3	fix zero3 integration (#1897 ) * fix zero3 integration * bump transformers and accelerate too	2024-09-05 10:58:50 -04:00
Wing Lian	ce33e1ed83	pin liger-kernel to latest 0.2.1 (#1882 ) [skip ci]	2024-08-30 17:51:18 -04:00

1 2 3 4

170 Commits