Wing Lian
434a484fe9
update doc snippets + reject gemma4-hybrid with non-FA2 backend
2026-04-23 22:27:01 +00:00
salman
294c7fe7a6
Distributed/ND-Parallel ( #2977 )
2025-07-31 15:25:02 -04:00
Dan Saunders
6aa41740df
SP dataloader patching + removing custom sampler / dataloader logic ( #2686 )
...
* utilize accelerate prepare_data_loader with patching
* lint
* cleanup, fix
* update to support DPO quirk
* small change
* coderabbit commits, cleanup, remove dead code
* quarto fix
* patch fix
* review comments
* moving monkeypatch up one level
* fix
2025-05-21 11:20:20 -04:00
Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes ( #2643 )
...
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
Dan Saunders
b8c633aa97
batch api HF adapter for ring-flash-attn; cleanup and improvements ( #2520 )
...
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* removing pad_to_sequence_len=False for now
* fix
* updating docs to include batch SP
* review comments
* fixes for batch API funcs, simplify
* fixes
* fix
* updates
* add batch_zigzag smoke test
2025-04-16 13:50:48 -04:00
Dan Saunders
5410195e0b
Sequence parallelism quick follow-ups; remove ModelCallback ( #2450 )
...
* guard return if ring attn alrady registered
* add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers)
* configurable heads_k_stride from ring-flash-attn hf adapter
2025-03-31 09:13:42 -04:00
Dan Saunders
23f0c51d88
Sequence parallelism ( #2412 )
...
* adding easy_context as integration for now
* progress on ring attn impl
* progress on ring attn impl
* cleanup
* remove errant file
* fix req
* removing unused code
* updates
* pytest
* update
* updates
* fixes
* precommit fixes
* working multi-group SP
* fixing sample packing
* remove debug logs and simplify
* eval dataloader and sampler changes
* removing some obvious comments
* update config.qmd and rename option
* scoping down problematic import
* another import scoping change
* pernicious Fire CLI bugfix
* isolate cli tests
* actually isolate CLI tests
* gracefully handle no ring-flash-attn
* fix
* fix
* move ring flash attn to extras with flash-attn (#2414 )
* removing flash-attn from requirements.txt (in setup.py extras already)
* rename file, delete another
* using field validator instead of model validator
* test fix
* sampler / dataloader refactor
* non-seq2se1 collator fix
* removing print statement
* bugfix
* add SP doc, review comments
* small changes
* review comments, docstrings
* refactors, SP mixin
* small updates
* fix tests
* precommit
* precommit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-03-21 12:43:55 -04:00