Commit Graph

2172 Commits

Author SHA1 Message Date
Wing Lian
2c66483a47 default to dropping last batch in multipack batch sampler 2025-06-05 16:00:24 -07:00
Wing Lian
01382b9a79 fix rebase issues 2025-06-05 15:31:28 -07:00
Wing Lian
cfcd69df0d rename vars for consistency 2025-06-05 15:29:21 -07:00
Wing Lian
2302b14a84 fix to remove attention_mask 2025-06-05 15:29:20 -07:00
Wing Lian
a8e2bddd19 increase hyperparams_count for gradients for added normalize_topk 2025-06-05 15:29:20 -07:00
Wing Lian
d55a51623f more KD updates 2025-06-05 15:29:20 -07:00
Wing Lian
73a84ad0dd post-rebase lint 2025-06-05 15:29:20 -07:00
Wing Lian
3cffe881bb accept compressed responses for smaller wire payload 2025-06-05 15:29:20 -07:00
Wing Lian
e77d62933d Fix decay 2025-06-05 15:29:19 -07:00
Wing Lian
3a0faa97ca fix trainer callback base class 2025-06-05 15:29:19 -07:00
Wing Lian
20602fd93f chore: lint 2025-06-05 15:29:17 -07:00
Wing Lian
770bb0605a support for dynamic plugin training args mixins and symmetric kl 2025-06-05 15:28:25 -07:00
Wing Lian
24b96b1c4f temp scale kd loss at end 2025-06-05 15:19:33 -07:00
Wing Lian
90c7228ff9 use max not min 2025-06-05 15:19:33 -07:00
Wing Lian
9eb53f5c9e fix length of padding 2025-06-05 15:19:33 -07:00
Wing Lian
225b420dc5 shift off the first empty token 2025-06-05 15:19:33 -07:00
Wing Lian
b75db13615 fix check 2025-06-05 15:19:33 -07:00
Wing Lian
c7b1db329e logsumexp trick: 2025-06-05 15:19:32 -07:00
Wing Lian
a40e484803 handle when no custom collator is used in plugins 2025-06-05 15:19:32 -07:00
Wing Lian
9899c924f9 suport sampling params/max new tokens 2025-06-05 15:19:32 -07:00
Wing Lian
505009b454 add close to comment block 2025-06-05 15:19:31 -07:00
Wing Lian
b4e96ef12c online kd wip 2025-06-05 15:19:04 -07:00
Wing Lian
a8d9fab635 don't need temp arg to distill method 2025-06-05 15:18:20 -07:00
Wing Lian
49e2fa825d additional plugin collator kwargs, don't scale up kd loss by t^2 2025-06-05 15:18:19 -07:00
Wing Lian
7263845207 remove debugging 2025-06-05 15:17:13 -07:00
Wing Lian
5ccfd225cb collator cls for plugins 2025-06-05 15:16:31 -07:00
Wing Lian
28eb8632a1 more fixes and liger-type chunked loss 2025-06-05 15:14:38 -07:00
Wing Lian
5cfaac3767 WIP chunked KD loss with autograd wrapper 2025-06-05 15:14:37 -07:00
Wing Lian
ca70fb7cb0 simplfy and remove zscore 2025-06-05 15:13:55 -07:00
Wing Lian
22b50d6619 drop top_k before softmax 2025-06-05 15:13:24 -07:00
Wing Lian
a2248673d8 kd trainer has kd temp as part of the init 2025-06-05 15:12:23 -07:00
Wing Lian
0399aefcb3 better handling to drop string fields for kd with raw dataset 2025-06-05 15:12:22 -07:00
Wing Lian
83ad248e5b fix input args 2025-06-05 15:12:22 -07:00
Wing Lian
6fafe46562 fix collator setup 2025-06-05 15:12:21 -07:00
Wing Lian
0e46367e01 kd fixes 2025-06-05 15:09:59 -07:00
Wing Lian
7909bfb076 add manual seed for flaky test_geglu_backward test (#2763) [skip ci] 2025-06-05 09:23:17 -07:00
Wing Lian
cb03c765a1 add uv tooling for e2e gpu tests (#2750)
* add uv tooling for e2e gpu tests

* fixes from PR feedback

* simplify check

* fix env var

* make sure to use uv for other install

* use raw_dockerfile_image

* Fix import

* fix args to experimental dockerfile image call

* use updated modal versions
2025-06-05 07:25:06 -07:00
Timofey Klyubin
4440b4a1ce remove unused field for chat_template.default for DPO training (#2755) [skip ci]
* remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

* lint and fix tests for new return value

* remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

lint and fix tests for new return value

fix for updated expected fields for dpo

remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

fix test still expecting "messages" field

* chore: lint

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-05 07:22:58 -07:00
NanoCode012
e8e45b3441 fix: remove hqq (#2759) [skip ci] 2025-06-05 07:22:23 -07:00
Wing Lian
c67910fa6f bump hf deps (#2735) [skip ci]
* bump hf deps

* upgrade liger-kernel too

* install cce from fork for transformers fix

* fix reference to vocab size in gemma3 patch

* use padding_idx instead of pad_token_id

* remove fixed gemma3 patch

* use updated cce fork

* fix local mllama cce patches w docstring

* add test for multipack with trainer setup and fix trainer for trainer refactor upstream

* bump modal version

* guard for iterable datasetS

* mllama model arch layout changed in latest transformers

* fix batch sampler with drop_last

* fix: address upstream vlm changes for lora

* fix: update references to old lora target path

* fix: remove mllama fa2 patch due to upstream fix

* fix: lora kernel patch path for multimodal models

* fix: removed mllama from quarto

* run test for came optim on 2.6.0+

* fix fsdp2 patch and remove deprecated patch

* make sure to set sequence_parallel_degree for grpo

* Add SP test for GRPO

* add sp to grpo config for trainer

* use reward_funcs as kwarg to grpo trainer

* fix the comprehension for reward funcs

* reward funcs already passed in as args

* init sp_group right before training

* fix check for adding models to SP context

* make sure to pass args to super

* upgrade deepspeed

* use updated trl and add reasoning flags for vllm

* patch the worker

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-06-05 07:20:33 -07:00
NanoCode012
787880215b fix(deepspeed): deepspeed config not being set for z3 (#2754)
* fix(deepspeed): deepspeed config not being set for z3

* fix: comments
2025-06-03 14:27:09 -07:00
NanoCode012
4b1a29c694 feat(modal): update docker tag to use torch2.6 from torch2.5 (#2749) [skip ci] 2025-06-03 14:26:07 -07:00
NanoCode012
d7fa60662e feat: add chat_template kwargs (#2694) [skip ci] 2025-06-03 14:25:26 -07:00
Dan Saunders
1d91d905c9 remove deprecated wandb env var (#2751)
* remove deprecated wandb env var

* remove os.environ wandb setting; unused loggers

* remove os.environ wandb setting; unused loggers
2025-06-03 14:04:15 -07:00
mhenrhcsen
2bf61d8e25 fix abbriviatation spelling error 2025-06-03 21:30:40 +02:00
mhenrhcsen
68788e419e feat: add Group Relative Policy Optimization (GPRO) to RLHF documentation 2025-06-03 21:30:40 +02:00
github-actions[bot]
94219f6ee8 chore: update pre-commit hooks (#2745)
* chore: update pre-commit hooks

* trigger linter when pre commit hooks are updated

* fix type checks from upgraded pre-commit

---------

Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-02 15:54:29 -07:00
Wing Lian
ecc719f5c7 add support for base image with uv (#2691) 2025-06-02 12:48:55 -07:00
NanoCode012
d5d0dc5938 fix: suppress non-axolotl logs unless it's warning or higher (#2724)
* fix: increase log level for root loggers and axolotl's

* fix: BasePlugin using wrong logger

* fix: update logger to take name from module

* feat: change logger class to AxolotlLogger to filter non-axolotl infos or below

* fix: change behavior to not disable existing loggers

* fix: update logging to respect correct env

* chore: fix comment

* fix: suppress accelerate log to LOG_LEVEL if not set

---------

Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-05-31 12:13:43 +07:00
NanoCode012
5e86c35322 fix(log): remove duplicate merge_lora param (#2742) [skip ci] 2025-05-31 12:13:31 +07:00