NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
NanoCode012
153edcfe79
fix(doc): add act checkpointing migration to fsdp2 docs ( #3193 ) [skip ci]
2025-10-10 10:57:50 +07:00
Wing Lian
7ed40f1d70
automatically set env vars for single gpu deepspeed zero3 ( #3118 ) [skip ci]
...
* automatically set env vars for single gpu deepspeed zero3
* use setdefault
2025-08-29 13:36:47 -04:00
salman
e5734e5cf0
adding torchtitan link ( #2945 ) [skip ci]
2025-07-19 13:54:14 -04:00
salman
d6e4a611e5
FSDP1 -> FSDP2 ( #2760 )
...
* FSDP2 args migration implementation
This commit implements the migration to FSDP2 arguments including:
- FSDP2 support with LoRA training
- DPO integration with FSDP2
- Model loading fixes and refactoring
- CPU offloading and PEFT handling
- Test updates and CI improvements
- Bug fixes for dtype errors and various edge cases
2025-07-12 15:18:01 +01:00
Wing Lian
76aeb16156
tiled_mlp supports single gpu ( #2891 )
...
* tiled_mlp supports single gpu
* use checkpoint offloading for arctic training
* patch torch checkpoint too
* support for single gpu zero3
* add linkback to where it was copied from
2025-07-09 12:48:22 -04:00
Dan Saunders
6aa41740df
SP dataloader patching + removing custom sampler / dataloader logic ( #2686 )
...
* utilize accelerate prepare_data_loader with patching
* lint
* cleanup, fix
* update to support DPO quirk
* small change
* coderabbit commits, cleanup, remove dead code
* quarto fix
* patch fix
* review comments
* moving monkeypatch up one level
* fix
2025-05-21 11:20:20 -04:00
NanoCode012
756a0559c1
feat(doc): explain deepspeed configs ( #2514 ) [skip ci]
...
* feat(doc): explain deepspeed configs
* fix: add fetch configs
2025-04-11 09:52:43 -04:00
Dan Saunders
5410195e0b
Sequence parallelism quick follow-ups; remove ModelCallback ( #2450 )
...
* guard return if ring attn alrady registered
* add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers)
* configurable heads_k_stride from ring-flash-attn hf adapter
2025-03-31 09:13:42 -04:00
NanoCode012
2efe1b4c09
Feat(doc): Reorganize documentation, fix broken syntax, update notes ( #2348 )
...
* feat(doc): organize docs, add to menu bar, fix broken formatting
* feat: add link to custom integrations
* feat: update readme for integrations to include citations and repo link
* chore: update lm_eval info
* chore: use fullname
* Update docs/cli.qmd per suggestion
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
* feat: add sweep doc
* feat: add kd doc
* fix: remove toc
* fix: update deprecation
* feat: add more info about chat_template issues
* fix: heading level
* fix: shell->bash code block
* fix: ray link
* fix(doc): heading level, header links, formatting
* feat: add grpo docs
* feat: add style changes
* fix: wrong cli arg for lm-eval
* fix: remove old run method
* feat: load custom integration doc dynamically
* fix: remove old cli way
* fix: toc
* fix: minor formatting
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
2025-02-25 16:09:37 +07:00
Dan Saunders
6f294c3d8d
refactor README; hardcode links to quarto docs; add additional quarto doc pages ( #2295 )
...
* refactor README; hardcode links to quarto docs; add additional quarto doc pages
* updates
* review comments
* update
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-01-30 12:49:21 -05:00