NanoCode012
|
372f664c63
|
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc (#3330) [skip-ci]
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
|
2025-12-25 17:56:20 +07:00 |
|
NanoCode012
|
2b66ee189c
|
Feat: add ministral3 (#3297)
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
|
2025-12-04 08:32:08 -05:00 |
|
NanoCode012
|
006f226270
|
Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275)
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
|
2025-11-24 10:21:31 +07:00 |
|