axolotl

Files

NanoCode012 dfba881e99 Feat: add gemma3n support (#2852 )

* feat: add gemma3n cce

* feat: add sample config

* feat: add gemma3n multimodal mode

* feat: add audio example

* feat: support audio and return pixel values in collator

* feat: support unmask only assistant region (gemma3n for now)

* feat(doc): add notes for audio loading

* feat: add audio support for gemma3n

* feat: update examples

* feat: add gemma3n to the docs

* fix: add link at top

* feat(doc): clarify additional requirements

* fix: mllama missing aspect ratio

* fix: mllama need attention fixes for fa2

* Partially Revert "fix: mllama need attention fixes for fa2"

This reverts commit a0bfdd1777.

* fix: disable FA2 for mllama in vision mode

* feat: update configs to use proper attention

* fix: support other vision features

* feat(doc): clarify requirements for gemma3n

2025-07-22 16:52:15 +07:00

bigstral-ds-zero3.yaml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

config.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

lora-mps.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

lora.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

mistral-dpo-qlora.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

mistral-qlora-fsdp.yml

checkpoint model on first step callback (#2906 )

2025-07-15 15:00:48 -04:00

mistral-qlora-orpo.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

mistral-small-3.1-24B-lora.yml

Feat: add gemma3n support (#2852 )

2025-07-22 16:52:15 +07:00

mixtral_22.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

mixtral-8x22b-qlora-fsdp.yml

checkpoint model on first step callback (#2906 )

2025-07-15 15:00:48 -04:00

mixtral-qlora-fsdp.yml

checkpoint model on first step callback (#2906 )

2025-07-15 15:00:48 -04:00

mixtral.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

qlora.yml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

README.md

Mixtral fixes 20240124 (#1192 ) [skip ci]

2024-01-24 14:59:57 -05:00

README.md

Mistral 7B is a language model with a total of 7.3 billion parameters, showcasing a notable performance across a variety of benchmarks.

Fine Tune:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml

If you run into CUDA OOM, use deepspeed with config zero2.json:

accelerate launch -m axolotl.cli.train examples/mistral/config.yml --deepspeed deepspeed_configs/zero2.json