Sequence parallelism (#2412)

* adding easy_context as integration for now

* progress on ring attn impl

* progress on ring attn impl

* cleanup

* remove errant file

* fix req

* removing unused code

* updates

* pytest

* update

* updates

* fixes

* precommit fixes

* working multi-group SP

* fixing sample packing

* remove debug logs and simplify

* eval dataloader and sampler changes

* removing some obvious comments

* update config.qmd and rename option

* scoping down problematic import

* another import scoping change

* pernicious Fire CLI bugfix

* isolate cli tests

* actually isolate CLI tests

* gracefully handle no ring-flash-attn

* fix

* fix

* move ring flash attn to extras with flash-attn (#2414)

* removing flash-attn from requirements.txt (in setup.py extras already)

* rename file, delete another

* using field validator instead of model validator

* test fix

* sampler / dataloader refactor

* non-seq2se1 collator fix

* removing print statement

* bugfix

* add SP doc, review comments

* small changes

* review comments, docstrings

* refactors, SP mixin

* small updates

* fix tests

* precommit

* precommit

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Dan Saunders <dan@axolotl.ai>
This commit is contained in:
Dan Saunders
2025-03-21 12:43:55 -04:00
committed by GitHub
parent 113e9cd193
commit 23f0c51d88
31 changed files with 1532 additions and 648 deletions

View File

@@ -125,6 +125,9 @@ def normalize_config(cfg):
with open(ds_config_path, encoding="utf-8") as f:
cfg.deepspeed = json.load(f)
if cfg.sequence_parallel_degree is None:
cfg.sequence_parallel_degree = 1
if cfg.saves_per_epoch:
save_steps = 1.0 / (cfg.saves_per_epoch * cfg.num_epochs)
if save_steps < 1.0: # prevent saves on every step