feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc (#3330) [skip-ci]

* feat: add pos id to flex attention for packing part 1

* feat: update to include sliding window mask patch

* fix: suppress MatMul8bitLt: inputs will be cast from warnings

* fix: remove redundant flex attention patch

* chore: update olmo docs

* feat: add validator patch for cross entropy
This commit is contained in:
NanoCode012
2025-12-25 17:56:20 +07:00
committed by GitHub
parent 97f1b1758d
commit 372f664c63
6 changed files with 41 additions and 167 deletions

View File

@@ -16,7 +16,7 @@ This guide shows how to fine-tune it with Axolotl with multi-turn conversations
axolotl train examples/olmo3/olmo3-7b-qlora.yaml
```
Let us know how it goes. Happy finetuning! 🚀
This uses about 11.3 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀
### TIPS

View File

@@ -42,10 +42,10 @@ wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002