axolotl

Author	SHA1	Message	Date
Wing Lian	2c66483a47	default to dropping last batch in multipack batch sampler	2025-06-05 16:00:24 -07:00
Wing Lian	01382b9a79	fix rebase issues	2025-06-05 15:31:28 -07:00
Wing Lian	cfcd69df0d	rename vars for consistency	2025-06-05 15:29:21 -07:00
Wing Lian	2302b14a84	fix to remove attention_mask	2025-06-05 15:29:20 -07:00
Wing Lian	a8e2bddd19	increase hyperparams_count for gradients for added normalize_topk	2025-06-05 15:29:20 -07:00
Wing Lian	d55a51623f	more KD updates	2025-06-05 15:29:20 -07:00
Wing Lian	73a84ad0dd	post-rebase lint	2025-06-05 15:29:20 -07:00
Wing Lian	3cffe881bb	accept compressed responses for smaller wire payload	2025-06-05 15:29:20 -07:00
Wing Lian	e77d62933d	Fix decay	2025-06-05 15:29:19 -07:00
Wing Lian	3a0faa97ca	fix trainer callback base class	2025-06-05 15:29:19 -07:00
Wing Lian	20602fd93f	chore: lint	2025-06-05 15:29:17 -07:00
Wing Lian	770bb0605a	support for dynamic plugin training args mixins and symmetric kl	2025-06-05 15:28:25 -07:00
Wing Lian	24b96b1c4f	temp scale kd loss at end	2025-06-05 15:19:33 -07:00
Wing Lian	90c7228ff9	use max not min	2025-06-05 15:19:33 -07:00
Wing Lian	9eb53f5c9e	fix length of padding	2025-06-05 15:19:33 -07:00
Wing Lian	225b420dc5	shift off the first empty token	2025-06-05 15:19:33 -07:00
Wing Lian	b75db13615	fix check	2025-06-05 15:19:33 -07:00
Wing Lian	c7b1db329e	logsumexp trick:	2025-06-05 15:19:32 -07:00
Wing Lian	a40e484803	handle when no custom collator is used in plugins	2025-06-05 15:19:32 -07:00
Wing Lian	9899c924f9	suport sampling params/max new tokens	2025-06-05 15:19:32 -07:00
Wing Lian	505009b454	add close to comment block	2025-06-05 15:19:31 -07:00
Wing Lian	b4e96ef12c	online kd wip	2025-06-05 15:19:04 -07:00
Wing Lian	a8d9fab635	don't need temp arg to distill method	2025-06-05 15:18:20 -07:00
Wing Lian	49e2fa825d	additional plugin collator kwargs, don't scale up kd loss by t^2	2025-06-05 15:18:19 -07:00
Wing Lian	7263845207	remove debugging	2025-06-05 15:17:13 -07:00
Wing Lian	5ccfd225cb	collator cls for plugins	2025-06-05 15:16:31 -07:00
Wing Lian	28eb8632a1	more fixes and liger-type chunked loss	2025-06-05 15:14:38 -07:00
Wing Lian	5cfaac3767	WIP chunked KD loss with autograd wrapper	2025-06-05 15:14:37 -07:00
Wing Lian	ca70fb7cb0	simplfy and remove zscore	2025-06-05 15:13:55 -07:00
Wing Lian	22b50d6619	drop top_k before softmax	2025-06-05 15:13:24 -07:00
Wing Lian	a2248673d8	kd trainer has kd temp as part of the init	2025-06-05 15:12:23 -07:00
Wing Lian	0399aefcb3	better handling to drop string fields for kd with raw dataset	2025-06-05 15:12:22 -07:00
Wing Lian	83ad248e5b	fix input args	2025-06-05 15:12:22 -07:00
Wing Lian	6fafe46562	fix collator setup	2025-06-05 15:12:21 -07:00
Wing Lian	0e46367e01	kd fixes	2025-06-05 15:09:59 -07:00
Wing Lian	7909bfb076	add manual seed for flaky test_geglu_backward test (#2763 ) [skip ci]	2025-06-05 09:23:17 -07:00
Wing Lian	cb03c765a1	add uv tooling for e2e gpu tests (#2750 ) * add uv tooling for e2e gpu tests * fixes from PR feedback * simplify check * fix env var * make sure to use uv for other install * use raw_dockerfile_image * Fix import * fix args to experimental dockerfile image call * use updated modal versions	2025-06-05 07:25:06 -07:00
Timofey Klyubin	4440b4a1ce	remove unused field for chat_template.default for DPO training (#2755 ) [skip ci] * remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise * lint and fix tests for new return value * remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise lint and fix tests for new return value fix for updated expected fields for dpo remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise fix test still expecting "messages" field * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-05 07:22:58 -07:00
NanoCode012	e8e45b3441	fix: remove hqq (#2759 ) [skip ci]	2025-06-05 07:22:23 -07:00
Wing Lian	c67910fa6f	bump hf deps (#2735 ) [skip ci] * bump hf deps * upgrade liger-kernel too * install cce from fork for transformers fix * fix reference to vocab size in gemma3 patch * use padding_idx instead of pad_token_id * remove fixed gemma3 patch * use updated cce fork * fix local mllama cce patches w docstring * add test for multipack with trainer setup and fix trainer for trainer refactor upstream * bump modal version * guard for iterable datasetS * mllama model arch layout changed in latest transformers * fix batch sampler with drop_last * fix: address upstream vlm changes for lora * fix: update references to old lora target path * fix: remove mllama fa2 patch due to upstream fix * fix: lora kernel patch path for multimodal models * fix: removed mllama from quarto * run test for came optim on 2.6.0+ * fix fsdp2 patch and remove deprecated patch * make sure to set sequence_parallel_degree for grpo * Add SP test for GRPO * add sp to grpo config for trainer * use reward_funcs as kwarg to grpo trainer * fix the comprehension for reward funcs * reward funcs already passed in as args * init sp_group right before training * fix check for adding models to SP context * make sure to pass args to super * upgrade deepspeed * use updated trl and add reasoning flags for vllm * patch the worker --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-06-05 07:20:33 -07:00
NanoCode012	787880215b	fix(deepspeed): deepspeed config not being set for z3 (#2754 ) * fix(deepspeed): deepspeed config not being set for z3 * fix: comments	2025-06-03 14:27:09 -07:00
NanoCode012	4b1a29c694	feat(modal): update docker tag to use torch2.6 from torch2.5 (#2749 ) [skip ci]	2025-06-03 14:26:07 -07:00
NanoCode012	d7fa60662e	feat: add chat_template kwargs (#2694 ) [skip ci]	2025-06-03 14:25:26 -07:00
Dan Saunders	1d91d905c9	remove deprecated wandb env var (#2751 ) * remove deprecated wandb env var * remove os.environ wandb setting; unused loggers * remove os.environ wandb setting; unused loggers	2025-06-03 14:04:15 -07:00
mhenrhcsen	2bf61d8e25	fix abbriviatation spelling error	2025-06-03 21:30:40 +02:00
mhenrhcsen	68788e419e	feat: add Group Relative Policy Optimization (GPRO) to RLHF documentation	2025-06-03 21:30:40 +02:00
github-actions[bot]	94219f6ee8	chore: update pre-commit hooks (#2745 ) * chore: update pre-commit hooks * trigger linter when pre commit hooks are updated * fix type checks from upgraded pre-commit --------- Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-02 15:54:29 -07:00
Wing Lian	ecc719f5c7	add support for base image with uv (#2691 )	2025-06-02 12:48:55 -07:00
NanoCode012	d5d0dc5938	fix: suppress non-axolotl logs unless it's warning or higher (#2724 ) * fix: increase log level for root loggers and axolotl's * fix: BasePlugin using wrong logger * fix: update logger to take name from module * feat: change logger class to AxolotlLogger to filter non-axolotl infos or below * fix: change behavior to not disable existing loggers * fix: update logging to respect correct env * chore: fix comment * fix: suppress accelerate log to LOG_LEVEL if not set --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-05-31 12:13:43 +07:00
NanoCode012	5e86c35322	fix(log): remove duplicate merge_lora param (#2742 ) [skip ci]	2025-05-31 12:13:31 +07:00

1 2 3 4 5 ...

2172 Commits