axolotl

Author	SHA1	Message	Date
Wing Lian	f196941315	additional fixes for docker and saving compressed	2025-04-28 13:16:29 -04:00
Rahul Tuli	5be047ac46	Fix: Test Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-04-28 13:16:29 -04:00
Rahul Tuli	758115b8c6	Apply patch from @winglian Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-04-28 13:16:29 -04:00
Rahul Tuli	0dc1da5876	Add: line about further optimizations using llmcompressor Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-04-28 13:16:29 -04:00
Rahul Tuli	f3e876dbfc	Address Review Comments: * deleted redundant docs/llm_compressor.qmd * incorporated feedback in integration README.md * added llmcompressor integration to docs/custom_integrations.qmd Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-04-28 13:16:29 -04:00
Rahul Tuli	99c13ef60c	Add: .qmd file	2025-04-28 13:16:29 -04:00
Rahul Tuli	2c24434ee0	Tests, Style, Updates	2025-04-28 13:16:29 -04:00
Rahul Tuli	586268a0d7	Rebase and updates!	2025-04-28 13:16:29 -04:00
Rahul Tuli	b600e119b6	Add: `llm_compressor` integration documentation	2025-04-28 13:16:29 -04:00
Rahul Tuli	a8e5ba000e	Move: LLMCompressorPlugin into it's own submodule	2025-04-28 13:16:29 -04:00
Rahul Tuli	bc3dfa666d	Update model config	2025-04-28 13:16:29 -04:00
Rahul Tuli	4371f3459e	Use: absolute import	2025-04-28 13:16:29 -04:00
Rahul Tuli	cc58d5e072	Rename: sft.yaml to sparse-finetuning.yaml	2025-04-28 13:16:29 -04:00
Rahul Tuli	d197b054e3	Add: llcompressor installable	2025-04-28 13:16:29 -04:00
Rahul Tuli	7e1e153831	Address review comments from @markurtz	2025-04-28 13:16:29 -04:00
Rahul Tuli	42de3096cf	Apply suggestions from @markurtz Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com>	2025-04-28 13:16:29 -04:00
Rahul Tuli	27758840a1	Update llmcompressor version to latest	2025-04-28 13:16:29 -04:00
Rahul Tuli	8dbf5c215a	Revert: TODO's	2025-04-28 13:16:29 -04:00
Rahul Tuli	6411ca3fe1	Use: warning over warn	2025-04-28 13:16:29 -04:00
Rahul Tuli	813809c54d	pre commit hooks	2025-04-28 13:16:29 -04:00
Rahul Tuli	af7cfdc30b	Add:llmcompressor instalable	2025-04-28 13:16:29 -04:00
Rahul Tuli	b76d2d1130	Update: review comments!	2025-04-28 13:16:29 -04:00
Rahul Tuli	7946f89df4	Add: SFTPlugin with llmcompressor	2025-04-28 13:16:29 -04:00
Dhruv Mullick	8b33ae1c4f	Fix bug in grpo reward module import (#2571 )	2025-04-28 00:31:56 -04:00
Wing Lian	dc4da4a7e2	update trl to 0.17.0 (#2560 ) * update trl to 0.17.0 * grpo + vllm no longer supported with 2.5.1 due to vllm constraints * disable VLLM_USE_V1 for ci * imporve handle killing off of multiprocessing vllm service * debug why this doesn't run in CI * increase vllm wait time * increase timeout to 5min * upgrade to vllm 0.8.4 * dump out the vllm log for debugging * use debug logging * increase vllm start timeout * use NVL instead * disable torch compile cache * revert some commented checks now that grpo tests are fixed * increase vllm timeoout back to 5min	2025-04-27 19:19:53 -04:00
Wing Lian	f9c7c3bb72	don't use is_main_process during config validation (#2569 )	2025-04-26 14:14:52 -04:00
Wing Lian	caf5cb63ea	add e2e smoke test for using activation/gradient checkpointing with offload (#2565 ) * add e2e smoke test for using activation/gradient checkpointing with offload * disable duplicate code check for the test * fix relative import * seq len too small to test this dataset with packing * Fix checkpoint ptaching for tests	2025-04-25 21:11:17 -04:00
Wing Lian	5dba5c82a8	fix support for wandb run_name for rl trainers (#2566 ) [skip ci] * fix support for wandb run_name for rl trainers * prefer to use wandb random names for run_name	2025-04-25 21:10:54 -04:00
Chiwan Park	e3c9d541a7	fix: crash when pretraining_dataset with dispatch_batches is false (#2558 )	2025-04-25 17:15:03 -04:00
NanoCode012	9eba0ad118	chore(doc): update docker tags on doc (#2559 ) [skip ci]	2025-04-25 17:14:48 -04:00
Wing Lian	53dbf97d85	make cce default to true when using the plugin (#2562 ) [skip ci]	2025-04-25 17:14:26 -04:00
Eko Julianto Salim	2c2563bc34	fix: gradient checkpointing functools.partial object has no attribute __self__ (#2563 ) [skip ci] * fix: gradient checkpointing causing functools.partial error * lint * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-04-25 17:02:37 -04:00
Wing Lian	5cb3398460	don't fail on codecov upload for external contributor PRs (#2564 ) [skip ci]	2025-04-25 15:10:55 -04:00
Dan Saunders	ae1c7ace63	Sequence parallel training context manager (#2553 ) * ctx manager for SP * updates * update * further simplifying * accommodate both training context managers * simplifying * simplifying * nit * reorg * tweak codecov yaml * add gather post hook, simplify, fixes * pytest * pytest fix	2025-04-25 10:33:54 -04:00
Wing Lian	1447beb132	make sure to validate the config before normalizing so defaults get set (#2554 ) * make sure to validate the config before normalizing so defaults get set * validation not needed for particular test * remove duplicate validations * set qlora correctly	2025-04-24 13:01:43 -04:00
Dan Saunders	66f41ec6f1	disable codecov pr annotations (#2556 )	2025-04-24 08:51:51 -04:00
NanoCode012	85053f4bd4	Fix(doc): add delinearize instruction (#2545 ) * fix: mention to install pytorch before axolotl * feat(doc): include instruction to delinearize * fix: update instruction for delinearize with adapter	2025-04-24 01:03:43 -04:00
Wing Lian	a4d5112ae1	builds for torch 2.7.0 (#2552 ) * builds for torch==2.7.0 * use xformers==0.0.29.post3 * no vllm support with torch 2.7 * update default, fix conditional * no xformers for 270 * no vllm on 2.7.0 for multigpu test too * remove deprecated verbose arg from scheduler * 2.7.0 tests on cpu	2025-04-24 00:39:31 -04:00
Wing Lian	0d691cc2a7	add base docker image with pytorch 2.7.0 and variant for cuda 12.8 (#2551 ) * add base docker image with pytorch 2.7.0 and variant for cuda 12.8 * my bash is terrible	2025-04-23 14:59:03 -04:00
Dan Saunders	c4053481ff	Codecov fixes / improvements (#2549 ) * adding codecov reporting * random change * codecov fixes * adding missing dependency * fix --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-04-23 10:33:30 -04:00
NanoCode012	a6d28d19b1	feat: add glm and glm4 multipack and cce (#2546 ) * feat: add glm and glm4 multipack * feat: add glm4 example * feat: add cce for glm	2025-04-23 10:27:51 -04:00
Wing Lian	32e335dd51	fix missing host/port for vllm (#2543 ) * fix missing host/port for vllm * set tensor parallel size so it doesn't always default to cli override	2025-04-22 10:16:48 -04:00
Wing Lian	7651550850	make sure to download fixtures for kd test (#2541 ) * make sure to download fixtures for kd test * use same alpaca dataset	2025-04-21 10:31:50 -04:00
Wing Lian	341e95aac9	prevent rate limiting to hf when using dispatch batches (#2536 ) [skip ci]	2025-04-21 10:31:35 -04:00
Catgat	b882dfb63f	Fixed Rex Scheduler Warm Up (#2535 ) [skip ci] * Fixed Rex Scheduler Warm Up * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-04-21 10:30:55 -04:00
Wing Lian	b640db1dbc	don't run multigpu tests twice, run SP in separate test (#2542 ) * don't run multigpu tests twice, run SP in separate test * fix multiline	2025-04-21 10:24:13 -04:00
Chiwan Park	4ce469d32e	fix: upgrade liger to 0.5.8 and use native Gemma3 patches (#2527 ) * fix: upgrade liger to 0.5.8 and use native Gemma3 patches * fix: make lint happy * doc: update Liger Kernel FLCE support for Gemma 3	2025-04-18 09:57:40 -07:00
Wing Lian	60a8f0958d	zero val fix for beta (#2538 )	2025-04-17 17:27:19 -07:00
NanoCode012	9da730d6a4	fix(doc): cut cross entropy installation instructions broken in qmd (#2532 )	2025-04-16 15:02:51 -07:00
NanoCode012	32637fad00	fix: preprocess yielding whole dataset to each worker (#2503 ) [skip ci]	2025-04-16 15:02:35 -07:00

1 2 3 4 5 ...

2043 Commits