axolotl

Files

NanoCode012 243620394a fix: force train split for json,csv,txt for test_datasets and misc doc changes (#3226 )

* fix: force train split for json,csv,txt for test_datasets

* feat(doc): add info on mixing datasets for VLM

* feat(doc): max memory

* fix(doc): clarify lr groups

* fix: add info on vision not being dropped

* feat: add qwen3-vl to multimodal docs

* fix: add moe blocks to arch list

* feat(doc): improve mistral docs

* chore: add helpful link [skip-e2e]

* fix: add vram usage for mistral small

* Update link in docs/faq.qmd

Co-authored-by: salman <salman.mohammadi@outlook.com>

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>

2025-10-22 15:23:20 -07:00

bigstral

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

dpo

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

mistral-small

fix: force train split for json,csv,txt for test_datasets and misc doc changes (#3226 )

2025-10-22 15:23:20 -07:00

mixtral

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

mps

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

orpo

Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 )

2025-09-18 15:42:20 +07:00

config.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

lora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

mistral-qlora-fsdp.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

qlora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

README.md

Mixtral fixes 20240124 (#1192 ) [skip ci]

2024-01-24 14:59:57 -05:00

README.md

Mistral 7B is a language model with a total of 7.3 billion parameters, showcasing a notable performance across a variety of benchmarks.

Fine Tune:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml

If you run into CUDA OOM, use deepspeed with config zero2.json:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml --deepspeed deepspeed_configs/zero2.json