* build examples readmes with quarto
* chore: formatting
* feat: dynamic build docs
* feat: add more model guides
* chore: format
* fix: collapse sidebar completely to have space for model guides
* fix: security protection for generated qmd
* fix: adjust collapse level, add new models, update links
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* fix: force train split for json,csv,txt for test_datasets
* feat(doc): add info on mixing datasets for VLM
* feat(doc): max memory
* fix(doc): clarify lr groups
* fix: add info on vision not being dropped
* feat: add qwen3-vl to multimodal docs
* fix: add moe blocks to arch list
* feat(doc): improve mistral docs
* chore: add helpful link [skip-e2e]
* fix: add vram usage for mistral small
* Update link in docs/faq.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com>
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>
* feat: add hunyuan cce support
* feat: update cce docs
* feat: add multipack support for granite and hunyuan
* feat: add hunyuan docs and example config
* feat: update readme instructions to include CCE installation
* fix: chat template log appearing despite tokenizer already having template
* feat: add vram usage
* fix: remove duplicate cce install
* fix: use latest commit of PR in case rebased/pushed
* Revert "fix: use latest commit of PR in case rebased/pushed"
This reverts commit 8b60aa00de.
* feat: update doc as upstream merged
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai>
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* fix: do not add training and training_detail block by default
* fixed: magistral docs
* fix: address pad adding new fields and use built-in from_openai
* feat: try enable multiprocessing
* fix: check for keys before deleting attn_mask
* feat: add mistral pad test
* feat: add tool calling test
* feat: add devstral tokenizer tests
* fix: comma format
* chore: remove unused support_preprocessing as tokenizer is pickable now
* chore: update magistral doc
* feat: add devstral readme and example
* chore: refactor error handling
* feat: add fsdp config for magistral
* fix: add mllama self attention handling for lora kernels
* fix: no eval if val_set_size 0 despite having test_datasets
* fix: add note for cce for vlm in newer model