axolotl

Files

NanoCode012 09959fac70 Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

* feat: update mistral common

* feat: add mistral3processor

* fix: loading

* fix: cast pixel_values to fp32

* fix: image tensor conversion

* feat: add FA2 support for pixtral based models

* fix: update mistral small 3.1 to use native tokenizer

* fix: install tips

* fix: improve info on sample dataset files

* chore: move mistral configs into subfolders

* fix: remove unneeded patch

* fix: indent

* feat: add integration tests

* chore: move

* feat: add magistral 2509 docs and example

* fix: convert tensor to bool

* feat: expand tests

* chore: move tests

2025-09-18 15:42:20 +07:00

bigstral

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

dpo

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

mistral-small

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

mixtral

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

mps

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

orpo

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

config.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

lora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

mistral-qlora-fsdp.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

qlora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

README.md

Mixtral fixes 20240124 (#1192 ) [skip ci]

2024-01-24 14:59:57 -05:00

README.md

Mistral 7B is a language model with a total of 7.3 billion parameters, showcasing a notable performance across a variety of benchmarks.

Fine Tune:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml

If you run into CUDA OOM, use deepspeed with config zero2.json:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml --deepspeed deepspeed_configs/zero2.json