Commit Graph

2132 Commits

Author SHA1 Message Date
NanoCode012
cd31394e70 feat: move epoch setting to base 2025-05-22 18:11:17 +07:00
NanoCode012
66d4319d80 fix: optimizer cls not being popped 2025-05-22 18:07:07 +07:00
NanoCode012
c6e730df64 chore: remove unused importlib.util import 2025-05-22 18:00:00 +07:00
NanoCode012
e55d64f709 feat: moved torch compile to base and refactor collator setting 2025-05-22 17:56:58 +07:00
NanoCode012
0fc6499461 fix: return notimplemented for ppo 2025-05-22 17:48:08 +07:00
NanoCode012
24b61c1b67 chore: lint 2025-05-22 17:44:30 +07:00
NanoCode012
b87850e11b feat: call hook_pre_create_trainer for rl 2025-05-22 17:42:46 +07:00
NanoCode012
49888eccb9 fix: address pr feedback 2025-05-16 14:36:38 +07:00
NanoCode012
00bfdb6b2b fix: update handling of trainer_cls in RL 2025-05-16 14:23:28 +07:00
NanoCode012
0b40f2aaf6 fix: leftover bug from rebase 2025-05-16 14:13:50 +07:00
NanoCode012
5c40896d19 fix: pop optimizer cls in rl too 2025-05-16 13:39:41 +07:00
NanoCode012
336c5f9db9 fix: move pop optimizer_cls_and_kwargs 2025-05-16 13:38:11 +07:00
NanoCode012
ad229ffa91 fix: add missing rex from rebase 2025-05-16 13:36:11 +07:00
NanoCode012
7898f44e9b fix: optimizer_cls_and_kwargs to be passed into trainer_kwargs 2025-05-15 20:42:12 +07:00
NanoCode012
7fd05c19f7 chore: simplify dynamo check 2025-05-15 20:28:13 +07:00
NanoCode012
64a57ebb62 fix: remove redundant configs from rebase mistake 2025-05-15 16:07:37 +07:00
NanoCode012
555190868a fix: import path for trainer builder and submodules 2025-05-15 15:49:37 +07:00
NanoCode012
a1832953c4 fix: update quarto autodoc 2025-05-15 15:39:47 +07:00
NanoCode012
930472b7c7 chore: add missing config to doc 2025-05-14 16:56:34 +07:00
NanoCode012
8a336a2c33 fix: remove deprecated clause 2025-05-14 16:55:34 +07:00
NanoCode012
316b450a87 feat: split training builder into sub modules 2025-05-14 16:53:50 +07:00
NanoCode012
c281c6e519 fix(test): use RLType directly to skip needing to validate 2025-05-14 16:17:34 +07:00
NanoCode012
06fae0d34e chore: add back return typing from rebase 2025-05-14 10:39:46 +07:00
NanoCode012
67b1df21aa feat: add handling for seed and SP/ring-attn config 2025-05-14 09:49:46 +07:00
NanoCode012
9af4bffd5d fix(test): set sequence_parallel_degree default in base cfg 2025-05-14 09:36:20 +07:00
NanoCode012
7c91cbddd3 fix: duplicate optim setting 2025-05-14 09:36:20 +07:00
NanoCode012
427e612d5a feat: allow custom optim for rl methods 2025-05-14 09:36:20 +07:00
NanoCode012
b8025b34b9 fix: lint 2025-05-14 09:33:49 +07:00
NanoCode012
51c2adf3b1 fix: remove redundant override 2025-05-14 09:33:49 +07:00
Wing Lian
cbcb7b081b use transformers default for logging steps, not None 2025-05-14 09:33:49 +07:00
Wing Lian
675561e745 improve handling of warmup/logging steps 2025-05-14 09:33:49 +07:00
NanoCode012
a6ce7d7522 fix: comments 2025-05-14 09:33:49 +07:00
NanoCode012
1ea6ce73ed feat: update CI on trainer_builder 2025-05-14 09:33:49 +07:00
NanoCode012
8aa722a140 fix: ignore max_length for grpo 2025-05-14 09:33:49 +07:00
NanoCode012
edaec9fe98 fix: add missing weight_decay handling 2025-05-14 09:33:28 +07:00
NanoCode012
8b6db0c72d fix: update default max_steps 2025-05-14 09:33:28 +07:00
NanoCode012
43f5373c79 fix: remove unnecessary datacollator kwarg insert and pop 2025-05-14 09:33:28 +07:00
NanoCode012
698268bc63 fix: max_steps incorrectly set 2025-05-14 09:33:28 +07:00
NanoCode012
9028eb2758 fix: adding missing Any 2025-05-14 09:33:28 +07:00
NanoCode012
077a54d2b1 fix: deprecate old types 2025-05-14 09:33:28 +07:00
NanoCode012
053e5fd7d1 chore: consolidate eval_strat, loraplus, lr sched, max_length 2025-05-14 09:33:28 +07:00
NanoCode012
fd271b2547 fix: consolidate handling of fp16, bf16, tf32 kwarg 2025-05-14 09:33:28 +07:00
NanoCode012
c268a0157a feat: add report_to to set run name 2025-05-14 09:33:28 +07:00
NanoCode012
6317945b67 fix: refactor sft and rl trainer to set same base args 2025-05-14 09:32:46 +07:00
NanoCode012
86ba574698 feat: add num_proc and load from cache for rl mapping 2025-05-14 09:32:04 +07:00
Wing Lian
7fa1089cea Atropos support (#2666) [skip ci]
* allow peft+liger+grpo and custom vllm serve for atropos support

* set trainer class for RL
2025-05-13 08:30:58 -04:00
Dan Saunders
80304c26a7 SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP

* updates

* update

* further simplifying

* simplifying

* simplifying

* reorg

* batch api HF adapter for ring-flash-attn; cleanup and improvements

* update

* adding all batch ring-flash-attn methods via single adapter

* fix

* fixes for batch API funcs, simplify

* fix

* grpo sp support

* progress

* stronger subclassing of TRL GRPO trainer; custom distributed sampler

* subclassing constructor

* progress

* finalizing SP + GRPO trainer

* minimize diffs to GRPO trainer

* remove (most of) the custom GRPO trainer logic

* debug

* debug

* update

* update

* update

* progress

* cleanup

* cleanup

* minor changes

* update

* update

* update

* small changes

* updates

* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint

* spacing

* cleanup; log in pydantic model config only on main process

* remove comment

* fix sp sampler, update to latest upstream code, doc

* add docs

* update quartodoc autodoc contents

* fix, simplifications

* fixes + simplifications

* review comments

* lint

* removing main process only logs in favor of #2608

* fixes, additional smoke test

* updatse

* more tests

* update

* fix grad accum bug (sort of)

* lint, tests

* todo
2025-05-12 17:52:40 -04:00
NanoCode012
67c4ea9c7c fix: disable auto lora kernel if dropout nonzero (#2655) [skip ci]
* fix: disable auto lora kernel if dropout nonzero

* Add comment from PR feedback

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-05-12 16:23:53 -04:00
Wing Lian
526ddb886d guard on deleting secrets from env (#2653) [skip ci] 2025-05-12 14:18:42 -04:00
Wing Lian
f34eef546a update doc and use P2P=LOC for brittle grpo test (#2649)
* update doc and skip brittle grpo test

* fix the path to run the multigpu tests

* increase timeout, use LOC instead of NVL

* typo

* use hf cache from s3 backed cloudfront

* mark grpo as flaky test dues to vllm start
2025-05-12 14:17:25 -04:00