NanoCode012
|
2e2f42918d
|
fix: remove duplicate lora plus setting
|
2025-05-22 18:24:41 +07:00 |
|
NanoCode012
|
e8eb3bfdf3
|
fix: catch unhandled custom optimizer
|
2025-05-22 18:14:44 +07:00 |
|
NanoCode012
|
cd31394e70
|
feat: move epoch setting to base
|
2025-05-22 18:11:17 +07:00 |
|
NanoCode012
|
66d4319d80
|
fix: optimizer cls not being popped
|
2025-05-22 18:07:07 +07:00 |
|
NanoCode012
|
c6e730df64
|
chore: remove unused importlib.util import
|
2025-05-22 18:00:00 +07:00 |
|
NanoCode012
|
e55d64f709
|
feat: moved torch compile to base and refactor collator setting
|
2025-05-22 17:56:58 +07:00 |
|
NanoCode012
|
0fc6499461
|
fix: return notimplemented for ppo
|
2025-05-22 17:48:08 +07:00 |
|
NanoCode012
|
24b61c1b67
|
chore: lint
|
2025-05-22 17:44:30 +07:00 |
|
NanoCode012
|
b87850e11b
|
feat: call hook_pre_create_trainer for rl
|
2025-05-22 17:42:46 +07:00 |
|
NanoCode012
|
49888eccb9
|
fix: address pr feedback
|
2025-05-16 14:36:38 +07:00 |
|
NanoCode012
|
00bfdb6b2b
|
fix: update handling of trainer_cls in RL
|
2025-05-16 14:23:28 +07:00 |
|
NanoCode012
|
0b40f2aaf6
|
fix: leftover bug from rebase
|
2025-05-16 14:13:50 +07:00 |
|
NanoCode012
|
5c40896d19
|
fix: pop optimizer cls in rl too
|
2025-05-16 13:39:41 +07:00 |
|
NanoCode012
|
336c5f9db9
|
fix: move pop optimizer_cls_and_kwargs
|
2025-05-16 13:38:11 +07:00 |
|
NanoCode012
|
ad229ffa91
|
fix: add missing rex from rebase
|
2025-05-16 13:36:11 +07:00 |
|
NanoCode012
|
7898f44e9b
|
fix: optimizer_cls_and_kwargs to be passed into trainer_kwargs
|
2025-05-15 20:42:12 +07:00 |
|
NanoCode012
|
7fd05c19f7
|
chore: simplify dynamo check
|
2025-05-15 20:28:13 +07:00 |
|
NanoCode012
|
64a57ebb62
|
fix: remove redundant configs from rebase mistake
|
2025-05-15 16:07:37 +07:00 |
|
NanoCode012
|
555190868a
|
fix: import path for trainer builder and submodules
|
2025-05-15 15:49:37 +07:00 |
|
NanoCode012
|
a1832953c4
|
fix: update quarto autodoc
|
2025-05-15 15:39:47 +07:00 |
|
NanoCode012
|
930472b7c7
|
chore: add missing config to doc
|
2025-05-14 16:56:34 +07:00 |
|
NanoCode012
|
8a336a2c33
|
fix: remove deprecated clause
|
2025-05-14 16:55:34 +07:00 |
|
NanoCode012
|
316b450a87
|
feat: split training builder into sub modules
|
2025-05-14 16:53:50 +07:00 |
|
NanoCode012
|
c281c6e519
|
fix(test): use RLType directly to skip needing to validate
|
2025-05-14 16:17:34 +07:00 |
|
NanoCode012
|
06fae0d34e
|
chore: add back return typing from rebase
|
2025-05-14 10:39:46 +07:00 |
|
NanoCode012
|
67b1df21aa
|
feat: add handling for seed and SP/ring-attn config
|
2025-05-14 09:49:46 +07:00 |
|
NanoCode012
|
9af4bffd5d
|
fix(test): set sequence_parallel_degree default in base cfg
|
2025-05-14 09:36:20 +07:00 |
|
NanoCode012
|
7c91cbddd3
|
fix: duplicate optim setting
|
2025-05-14 09:36:20 +07:00 |
|
NanoCode012
|
427e612d5a
|
feat: allow custom optim for rl methods
|
2025-05-14 09:36:20 +07:00 |
|
NanoCode012
|
b8025b34b9
|
fix: lint
|
2025-05-14 09:33:49 +07:00 |
|
NanoCode012
|
51c2adf3b1
|
fix: remove redundant override
|
2025-05-14 09:33:49 +07:00 |
|
Wing Lian
|
cbcb7b081b
|
use transformers default for logging steps, not None
|
2025-05-14 09:33:49 +07:00 |
|
Wing Lian
|
675561e745
|
improve handling of warmup/logging steps
|
2025-05-14 09:33:49 +07:00 |
|
NanoCode012
|
a6ce7d7522
|
fix: comments
|
2025-05-14 09:33:49 +07:00 |
|
NanoCode012
|
1ea6ce73ed
|
feat: update CI on trainer_builder
|
2025-05-14 09:33:49 +07:00 |
|
NanoCode012
|
8aa722a140
|
fix: ignore max_length for grpo
|
2025-05-14 09:33:49 +07:00 |
|
NanoCode012
|
edaec9fe98
|
fix: add missing weight_decay handling
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
8b6db0c72d
|
fix: update default max_steps
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
43f5373c79
|
fix: remove unnecessary datacollator kwarg insert and pop
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
698268bc63
|
fix: max_steps incorrectly set
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
9028eb2758
|
fix: adding missing Any
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
077a54d2b1
|
fix: deprecate old types
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
053e5fd7d1
|
chore: consolidate eval_strat, loraplus, lr sched, max_length
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
fd271b2547
|
fix: consolidate handling of fp16, bf16, tf32 kwarg
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
c268a0157a
|
feat: add report_to to set run name
|
2025-05-14 09:33:28 +07:00 |
|
NanoCode012
|
6317945b67
|
fix: refactor sft and rl trainer to set same base args
|
2025-05-14 09:32:46 +07:00 |
|
NanoCode012
|
86ba574698
|
feat: add num_proc and load from cache for rl mapping
|
2025-05-14 09:32:04 +07:00 |
|
Wing Lian
|
7fa1089cea
|
Atropos support (#2666) [skip ci]
* allow peft+liger+grpo and custom vllm serve for atropos support
* set trainer class for RL
|
2025-05-13 08:30:58 -04:00 |
|
Dan Saunders
|
80304c26a7
|
SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
|
2025-05-12 17:52:40 -04:00 |
|
NanoCode012
|
67c4ea9c7c
|
fix: disable auto lora kernel if dropout nonzero (#2655) [skip ci]
* fix: disable auto lora kernel if dropout nonzero
* Add comment from PR feedback
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
|
2025-05-12 16:23:53 -04:00 |
|