Wing Lian
b5d4c7ff54
allow 1% deviation for codecov ( #3138 ) [skip ci]
2025-09-07 11:01:03 -04:00
Wing Lian
9dde9e1b71
misc fixes 202507 ( #2937 ) [skip ci]
...
* misc fixes 202507
* manually handle attn class for llama4
2025-07-17 09:47:45 -04:00
Wing Lian
f34eef546a
update doc and use P2P=LOC for brittle grpo test ( #2649 )
...
* update doc and skip brittle grpo test
* fix the path to run the multigpu tests
* increase timeout, use LOC instead of NVL
* typo
* use hf cache from s3 backed cloudfront
* mark grpo as flaky test dues to vllm start
2025-05-12 14:17:25 -04:00
Dan Saunders
ae1c7ace63
Sequence parallel training context manager ( #2553 )
...
* ctx manager for SP
* updates
* update
* further simplifying
* accommodate both training context managers
* simplifying
* simplifying
* nit
* reorg
* tweak codecov yaml
* add gather post hook, simplify, fixes
* pytest
* pytest fix
2025-04-25 10:33:54 -04:00
Dan Saunders
66f41ec6f1
disable codecov pr annotations ( #2556 )
2025-04-24 08:51:51 -04:00
Dan Saunders
f776f889a1
adding codecov reporting ( #2372 ) [skip ci]
...
* adding codecov reporting
* update codecov-action to v5
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-04-16 15:02:17 -07:00