diff --git a/.nojekyll b/.nojekyll index f3c74bd25..7cd04cfc1 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -605b84d9 \ No newline at end of file +77cdfda6 \ No newline at end of file diff --git a/docs/api/cli.main.html b/docs/api/cli.main.html index edd9603a4..d49172e74 100644 --- a/docs/api/cli.main.html +++ b/docs/api/cli.main.html @@ -790,7 +790,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
cli.main.agent_docs(topic, list_topics)Show agent-optimized documentation.
+Prints reference docs designed for AI coding agents. +These docs are bundled with the package — no network access needed.
++Examples: +axolotl agent-docs # overview (start here) +axolotl agent-docs grpo # GRPO reference +axolotl agent-docs sft # SFT reference +axolotl agent-docs –list # list all topics
+cli.main.cli()cli.main.cli()Axolotl CLI - Train and fine-tune large language models
cli.main.config_schema(output_format, field)Dump the full config JSON schema.
+Useful for AI agents and tooling to discover all available config options, +their types, defaults, and descriptions.
++Examples: +axolotl config-schema # full JSON schema +axolotl config-schema –format yaml # YAML format +axolotl config-schema –field adapter # single field
+cli.main.evaluate(ctx, config, launcher, **kwargs)cli.main.evaluate(ctx, config, launcher, **kwargs)Evaluate a model.
cli.main.fetch(directory, dest)cli.main.fetch(directory, dest)Fetch example configs or other resources.
Available directories: - examples: Example configuration files -- deepspeed_configs: DeepSpeed configuration files
+- deepspeed_configs: DeepSpeed configuration files +- docs: Full documentation (Quarto markdown files)| directory | str | -One of examples, deepspeed_configs. |
+One of examples, deepspeed_configs, docs. |
required |
| Optimized LoRA implementation for output projection. | ||||
| LoRA_QK | +Optimized LoRA QK implementation for models where v_proj is None. | +|||
| LoRA_QKV | Optimized LoRA QKV implementation with quantization support. | Applies LoRA to output projection layer. | ||
| apply_lora_qk | +Applies LoRA to compute Query and Key projections for models where v_proj is None. | +|||
| apply_lora_qkv | Applies LoRA to compute Query, Key, Value projections. | |||
| get_embedding_lora_parameters | Extract LoRA parameters from a PEFT Embedding module. | |||
| get_lora_parameters | Gets LoRA parameters from a projection module. | |||
| matmul_lora | Efficient fused matmul + LoRA computation. |
kernels.lora.apply_lora_embedding(self, x)kernels.lora.apply_lora_embedding(self, x)Applies LoRA to embedding layer.
kernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)kernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)Applies LoRA to MLP layer with GEGLU activation.
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)kernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)Applies LoRA to MLP layer with SwiGLU activation.
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_o(self, X)kernels.lora.apply_lora_o(self, X)Applies LoRA to output projection layer.
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_qk(self, X, inplace=True)Applies LoRA to compute Query and Key projections for models where v_proj is None.
+When v_proj is None (e.g. Gemma4 attention_k_eq_v), key states are reused as +value states. Returns (Q, K, K) — the caller’s patched forward will use K as V. +Because K is returned twice, autograd accumulates gradients from both the key and +value paths into dK before calling LoRA_QK.backward.
+Supports bias, dropout, and DoRA.
+kernels.lora.apply_lora_qkv(self, X, inplace=True)kernels.lora.apply_lora_qkv(self, X, inplace=True)Applies LoRA to compute Query, Key, Value projections.
Supports bias, dropout, and DoRA. Dropout is applied outside the autograd Function so PyTorch handles its backward automatically. A single shared @@ -958,12 +988,12 @@ dropout mask is used across Q, K, V projections for memory efficiency.
kernels.lora.get_embedding_lora_parameters(embed)kernels.lora.get_embedding_lora_parameters(embed)Extract LoRA parameters from a PEFT Embedding module.
kernels.lora.get_lora_parameters(proj)kernels.lora.get_lora_parameters(proj)Gets LoRA parameters from a projection module.
kernels.lora.matmul_lora(
- X,
- W,
- b,
- W_quant,
- A,
- B,
- s,
- out=None,
- X_drop=None,
- lora_bias=None,
-)kernels.lora.matmul_lora(
+ X,
+ W,
+ b,
+ W_quant,
+ A,
+ B,
+ s,
+ out=None,
+ X_drop=None,
+ lora_bias=None,
+)Efficient fused matmul + LoRA computation.
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@63b15e6"pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88"Because there is no SparseMoeBlock class to patch, Gemma 4 uses a different integration path: we register "scattermoe" as a custom implementation in the transformers ExpertsInterface, and set experts_implementation: scattermoe in the config. The @use_experts_implementation decorator on Gemma4TextExperts then dispatches to our ScatterMoE kernel automatically. The router is untouched — it runs as-is.
Important limitations:
-- Flash Attention 2 is not supported — Gemma 4 uses global_head_dim: 512 for full attention layers, which exceeds FA2’s maximum head dimension of 256. Use sdp_attention: true instead.
-- Multimodal model: Gemma 4 includes vision and audio encoders. For text-only SFT, use lora_target_linear_modules with a regex to restrict LoRA to the text backbone (e.g. language_model\.model\.layers\.\d+\.self_attn\.(q|k|v|o)_proj).
%%capture
# This step can take ~5-10 minutes to install dependencies
!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1
-!pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@63b15e6"Axolotl ships with built-in documentation optimized for AI coding agents (Claude Code, Cursor, Copilot, etc.). These docs are bundled with the pip package — no repo clone needed.
+# Show overview and available training methods
+axolotl agent-docs
+
+# Topic-specific references
+axolotl agent-docs sft # supervised fine-tuning
+axolotl agent-docs grpo # GRPO online RL
+axolotl agent-docs preference_tuning # DPO, KTO, ORPO, SimPO
+axolotl agent-docs reward_modelling # outcome and process reward models
+axolotl agent-docs pretraining # continual pretraining
+axolotl agent-docs --list # list all topics
+
+# Dump config schema for programmatic use
+axolotl config-schema
+axolotl config-schema --field adapterIf you’re working with the source repo, agent docs are also available at docs/agents/ and the project overview is in AGENTS.md.
If you use Axolotl in your research or projects, please cite it as follows:
-@software{axolotl,
- title = {Axolotl: Open Source LLM Post-Training},
- author = {{Axolotl maintainers and contributors}},
- url = {https://github.com/axolotl-ai-cloud/axolotl},
- license = {Apache-2.0},
- year = {2023}
-}@software{axolotl,
+ title = {Axolotl: Open Source LLM Post-Training},
+ author = {{Axolotl maintainers and contributors}},
+ url = {https://github.com/axolotl-ai-cloud/axolotl},
+ license = {Apache-2.0},
+ year = {2023}
+}