From a3cdeab27e87306bc8262dc6001db5872d95c6cb Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Thu, 19 Feb 2026 23:34:25 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- docs/cli.html | 16 +- docs/custom_integrations.html | 1007 ++++++++++++++++++--------------- search.json | 15 +- sitemap.xml | 472 +++++++-------- 5 files changed, 804 insertions(+), 708 deletions(-) diff --git a/.nojekyll b/.nojekyll index 6b0c7b8ae..9c1fda326 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -eac6727e \ No newline at end of file +8763ebce \ No newline at end of file diff --git a/docs/cli.html b/docs/cli.html index 6e0983219..87f9fb3fa 100644 --- a/docs/cli.html +++ b/docs/cli.html @@ -944,13 +944,15 @@ the CLI commands, their usage, and common examples.

# Basic evaluation
 axolotl lm-eval config.yml

Configuration options:

-
# List of tasks to evaluate
-lm_eval_tasks:
-  - arc_challenge
-  - hellaswag
-lm_eval_batch_size: # Batch size for evaluation
-output_dir: # Directory to save evaluation results
-

See LM Eval Harness for more details.

+
lm_eval_model: # model to evaluate (local or hf path)
+
+# List of tasks to evaluate
+lm_eval_tasks:
+  - arc_challenge
+  - hellaswag
+lm_eval_batch_size: # Batch size for evaluation
+output_dir: # Directory to save evaluation results
+

See LM Eval Harness integration docs for full configuration details.

delinearize-llama4

diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html index 78f1e15df..2cad39191 100644 --- a/docs/custom_integrations.html +++ b/docs/custom_integrations.html @@ -778,14 +778,21 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • Usage
  • Citation
  • -
  • Knowledge Distillation (KD) +
  • Kernels Integration
  • +
  • Knowledge Distillation (KD) +
  • LLMCompressor
  • Language Model Evaluation Harness (LM Eval)
  • Liger Kernels
  • Spectrum
  • SwanLab Integration for Axolotl @@ -839,7 +849,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • Example: DPO Training with Completion Logging
  • Example: Disable Completion Logging
  • Supported RLHF Trainers
  • -
  • How It Works
  • +
  • How It Works
  • Viewing Completion Tables
  • Memory Management
  • Performance Impact
  • @@ -1200,25 +1210,63 @@ The quick brown fox jumps over the loud dog

    Please see reference here

    -
    -

    Knowledge Distillation (KD)

    +
    +

    Kernels Integration

    +

    MoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:

    +
    class ExpertsInterface(GeneralInterface):
    +    _global_mapping = {
    +        "batched_mm": batched_mm_experts_forward,
    +        "grouped_mm": grouped_mm_experts_forward,
    +    }
    +

    In our custom integration, we add support for ScatterMoE, which is even more efficient and faster than grouped_mm.

    Usage

    -
    plugins:
    -  - "axolotl.integrations.kd.KDPlugin"
    -
    -kd_trainer: True
    -kd_ce_alpha: 0.1
    -kd_alpha: 0.9
    -kd_temperature: 1.0
    -
    -torch_compile: True  # torch>=2.6.0, recommended to reduce vram
    -
    -datasets:
    -  - path: ...
    -    type: "axolotl.integrations.kd.chat_template"
    -    field_messages: "messages_combined"
    -    logprobs_field: "llm_text_generation_vllm_logprobs"  # for kd only, field of logprobs
    +

    Add the following to your axolotl YAML config:

    +
    plugins:
    +  - axolotl.integrations.kernels.KernelsPlugin
    +
    +use_kernels: true
    +use_scattermoe: true
    +

    Important: Setting experts_implementation is incompatible with use_scattermoe.

    +
    +
    +

    How It Works

    +

    The KernelsPlugin runs before model loading and:

    +
      +
    1. Registers the ScatterMoE kernel from the axolotl-ai-co/scattermoe Hub repo.
    2. +
    3. Patches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation.
    4. +
    +

    This works for any MoE model in transformers that uses a SparseMoeBlock class (Mixtral, Qwen2-MoE, OLMoE, etc.).

    +
    +
    +

    Limitations

    +

    ScatterMoE uses a softmax -> topk routing, so results may be different for some model arch as baseline (GPT-OSS, GLM_MOE_DSA).

    +
    +
    +

    Note on MegaBlocks

    +

    We tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.

    +

    Please see reference here

    +
    +
    +
    +

    Knowledge Distillation (KD)

    +
    +

    Usage

    +
    plugins:
    +  - "axolotl.integrations.kd.KDPlugin"
    +
    +kd_trainer: True
    +kd_ce_alpha: 0.1
    +kd_alpha: 0.9
    +kd_temperature: 1.0
    +
    +torch_compile: True  # torch>=2.6.0, recommended to reduce vram
    +
    +datasets:
    +  - path: ...
    +    type: "axolotl.integrations.kd.chat_template"
    +    field_messages: "messages_combined"
    +    logprobs_field: "llm_text_generation_vllm_logprobs"  # for kd only, field of logprobs

    An example dataset can be found at axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample

    Please see reference here

    @@ -1233,34 +1281,34 @@ The quick brown fox jumps over the loud dog

    Requirements

    • Axolotl with llmcompressor extras:

      -
      pip install "axolotl[llmcompressor]"
    • +
      pip install "axolotl[llmcompressor]"
    • Requires llmcompressor >= 0.5.1

    This will install all necessary dependencies to fine-tune sparsified models using the integration.


    -
    -

    Usage

    +
    +

    Usage

    To enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:

    -
    plugins:
    -  - axolotl.integrations.llm_compressor.LLMCompressorPlugin
    -
    -llmcompressor:
    -  recipe:
    -    finetuning_stage:
    -      finetuning_modifiers:
    -        ConstantPruningModifier:
    -          targets: [
    -            're:.*q_proj.weight',
    -            're:.*k_proj.weight',
    -            're:.*v_proj.weight',
    -            're:.*o_proj.weight',
    -            're:.*gate_proj.weight',
    -            're:.*up_proj.weight',
    -            're:.*down_proj.weight',
    -          ]
    -          start: 0
    -  save_compressed: true
    +
    plugins:
    +  - axolotl.integrations.llm_compressor.LLMCompressorPlugin
    +
    +llmcompressor:
    +  recipe:
    +    finetuning_stage:
    +      finetuning_modifiers:
    +        ConstantPruningModifier:
    +          targets: [
    +            're:.*q_proj.weight',
    +            're:.*k_proj.weight',
    +            're:.*v_proj.weight',
    +            're:.*o_proj.weight',
    +            're:.*gate_proj.weight',
    +            're:.*up_proj.weight',
    +            're:.*down_proj.weight',
    +          ]
    +          start: 0
    +  save_compressed: true

    This plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.

    Pre-sparsified checkpoints can be: - Generated using LLMCompressor @@ -1287,22 +1335,22 @@ The quick brown fox jumps over the loud dog

    After fine-tuning your sparse model, you can leverage vLLM for efficient inference. You can also use LLMCompressor to apply additional quantization to your fine-tuned sparse model before inference for even greater performance benefits.:

    -
    from vllm import LLM, SamplingParams
    -
    -prompts = [
    -    "Hello, my name is",
    -    "The president of the United States is",
    -    "The capital of France is",
    -    "The future of AI is",
    -]
    -sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    -llm = LLM("path/to/your/sparse/model")
    -outputs = llm.generate(prompts, sampling_params)
    -
    -for output in outputs:
    -    prompt = output.prompt
    -    generated_text = output.outputs[0].text
    -    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
    +
    from vllm import LLM, SamplingParams
    +
    +prompts = [
    +    "Hello, my name is",
    +    "The president of the United States is",
    +    "The capital of France is",
    +    "The future of AI is",
    +]
    +sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    +llm = LLM("path/to/your/sparse/model")
    +outputs = llm.generate(prompts, sampling_params)
    +
    +for output in outputs:
    +    prompt = output.prompt
    +    generated_text = output.outputs[0].text
    +    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

    For more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.

    @@ -1316,31 +1364,66 @@ sparse model before inference for even greater performance benefits.:

    Language Model Evaluation Harness (LM Eval)

    Run evaluation on model using the popular lm-evaluation-harness library.

    See https://github.com/EleutherAI/lm-evaluation-harness

    -
    -

    Usage

    -
    plugins:
    -  - axolotl.integrations.lm_eval.LMEvalPlugin
    -
    -lm_eval_tasks:
    -  - gsm8k
    -  - hellaswag
    -  - arc_easy
    -
    -lm_eval_batch_size: # Batch size for evaluation
    -output_dir: # Directory to save evaluation results
    +
    +

    Usage

    +

    There are two ways to use the LM Eval integration:

    +
    +
    +

    1. Post-Training Evaluation

    +

    When training with the plugin enabled, evaluation runs automatically after training completes:

    +
    plugins:
    +  - axolotl.integrations.lm_eval.LMEvalPlugin
    +
    +lm_eval_tasks:
    +  - gsm8k
    +  - hellaswag
    +  - arc_easy
    +
    +lm_eval_batch_size: # Batch size for evaluation
    +
    +output_dir:
    +

    Run training as usual:

    +
    axolotl train config.yml
    +
    +
    +

    2. Standalone CLI Evaluation

    +

    Evaluate any model directly without training:

    +
    lm_eval_model: meta-llama/Llama-2-7b-hf
    +
    +plugins:
    +  - axolotl.integrations.lm_eval.LMEvalPlugin
    +
    +lm_eval_tasks:
    +  - gsm8k
    +  - hellaswag
    +  - arc_easy
    +
    +lm_eval_batch_size: 8
    +output_dir: ./outputs
    +

    Run evaluation:

    +
    axolotl lm-eval config.yml
    +
    +
    +

    Model Selection Priority

    +

    The model to evaluate is selected in the following priority order:

    +
      +
    1. lm_eval_model - Explicit model path or HuggingFace repo (highest priority)
    2. +
    3. hub_model_id - Trained model pushed to HuggingFace Hub
    4. +
    5. output_dir - Local checkpoint directory containing trained model weights
    6. +

    Citation

    -
    @misc{eval-harness,
    -  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
    -  title        = {A framework for few-shot language model evaluation},
    -  month        = 07,
    -  year         = 2024,
    -  publisher    = {Zenodo},
    -  version      = {v0.4.3},
    -  doi          = {10.5281/zenodo.12608602},
    -  url          = {https://zenodo.org/records/12608602}
    -}
    +
    @misc{eval-harness,
    +  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
    +  title        = {A framework for few-shot language model evaluation},
    +  month        = 07,
    +  year         = 2024,
    +  publisher    = {Zenodo},
    +  version      = {v0.4.3},
    +  doi          = {10.5281/zenodo.12608602},
    +  url          = {https://zenodo.org/records/12608602}
    +}

    Please see reference here

    @@ -1353,17 +1436,17 @@ sparse model before inference for even greater performance benefits.:

  • Compatibility with both FSDP and DeepSpeed
  • See https://github.com/linkedin/Liger-Kernel

    -
    -

    Usage

    -
    plugins:
    -  - axolotl.integrations.liger.LigerPlugin
    -liger_rope: true
    -liger_rms_norm: true
    -liger_glu_activation: true
    -liger_layer_norm: true
    -liger_fused_linear_cross_entropy: true
    -
    -liger_use_token_scaling: true
    +
    +

    Usage

    +
    plugins:
    +  - axolotl.integrations.liger.LigerPlugin
    +liger_rope: true
    +liger_rms_norm: true
    +liger_glu_activation: true
    +liger_layer_norm: true
    +liger_fused_linear_cross_entropy: true
    +
    +liger_use_token_scaling: true

    Supported Models

    @@ -1389,16 +1472,16 @@ sparse model before inference for even greater performance benefits.:

    Citation

    -
    @article{hsu2024ligerkernelefficienttriton,
    -      title={Liger Kernel: Efficient Triton Kernels for LLM Training},
    -      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
    -      year={2024},
    -      eprint={2410.10989},
    -      archivePrefix={arXiv},
    -      primaryClass={cs.LG},
    -      url={https://arxiv.org/abs/2410.10989},
    -      journal={arXiv preprint arXiv:2410.10989},
    -}
    +
    @article{hsu2024ligerkernelefficienttriton,
    +      title={Liger Kernel: Efficient Triton Kernels for LLM Training},
    +      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
    +      year={2024},
    +      eprint={2410.10989},
    +      archivePrefix={arXiv},
    +      primaryClass={cs.LG},
    +      url={https://arxiv.org/abs/2410.10989},
    +      journal={arXiv preprint arXiv:2410.10989},
    +}

    Please see reference here

    @@ -1412,25 +1495,25 @@ sparse model before inference for even greater performance benefits.:

    Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.

    -
    -

    Usage

    -
    plugins:
    -  - axolotl.integrations.spectrum.SpectrumPlugin
    -
    -spectrum_top_fraction: 0.5
    -spectrum_model_name: meta-llama/Meta-Llama-3.1-8B
    +
    +

    Usage

    +
    plugins:
    +  - axolotl.integrations.spectrum.SpectrumPlugin
    +
    +spectrum_top_fraction: 0.5
    +spectrum_model_name: meta-llama/Meta-Llama-3.1-8B

    Citation

    -
    @misc{hartford2024spectrumtargetedtrainingsignal,
    -      title={Spectrum: Targeted Training on Signal to Noise Ratio},
    -      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
    -      year={2024},
    -      eprint={2406.06623},
    -      archivePrefix={arXiv},
    -      primaryClass={cs.LG},
    -      url={https://arxiv.org/abs/2406.06623},
    -}
    +
    @misc{hartford2024spectrumtargetedtrainingsignal,
    +      title={Spectrum: Targeted Training on Signal to Noise Ratio},
    +      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
    +      year={2024},
    +      eprint={2406.06623},
    +      archivePrefix={arXiv},
    +      primaryClass={cs.LG},
    +      url={https://arxiv.org/abs/2406.06623},
    +}

    Please see reference here

    @@ -1454,7 +1537,7 @@ By identifying the top n% of layers with the highest SNR, you can optimize train

    Installation

    -
    pip install swanlab
    +
    pip install swanlab

    Quick Start

    @@ -1466,23 +1549,23 @@ By identifying the top n% of layers with the highest SNR, you can optimize train

    2. Configure Axolotl Config File

    Add SwanLab configuration to your Axolotl YAML config:

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: my-llm-project
    -swanlab_experiment_name: qwen-finetune-v1
    -swanlab_mode: cloud  # Options: cloud, local, offline, disabled
    -swanlab_workspace: my-team  # Optional: organization name
    -swanlab_api_key: YOUR_API_KEY  # Optional: can also use env var SWANLAB_API_KEY
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: my-llm-project
    +swanlab_experiment_name: qwen-finetune-v1
    +swanlab_mode: cloud  # Options: cloud, local, offline, disabled
    +swanlab_workspace: my-team  # Optional: organization name
    +swanlab_api_key: YOUR_API_KEY  # Optional: can also use env var SWANLAB_API_KEY

    3. Run Training

    -
    export SWANLAB_API_KEY=your-api-key-here
    -
    -swanlab login
    -
    -accelerate launch -m axolotl.cli.train your-config.yaml
    +
    export SWANLAB_API_KEY=your-api-key-here
    +
    +swanlab login
    +
    +accelerate launch -m axolotl.cli.train your-config.yaml

    Configuration Options

    @@ -1624,46 +1707,46 @@ By identifying the top n% of layers with the highest SNR, you can optimize train

    Example 1: Basic Cloud Sync

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: llama-finetune
    -swanlab_experiment_name: llama-3-8b-instruct-v1
    -swanlab_mode: cloud
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: llama-finetune
    +swanlab_experiment_name: llama-3-8b-instruct-v1
    +swanlab_mode: cloud

    Example 2: Offline/Local Mode

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: local-experiments
    -swanlab_experiment_name: test-run-1
    -swanlab_mode: local  # or 'offline'
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: local-experiments
    +swanlab_experiment_name: test-run-1
    +swanlab_mode: local  # or 'offline'

    Example 3: Team Workspace

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: research-project
    -swanlab_experiment_name: experiment-42
    -swanlab_workspace: my-research-team
    -swanlab_mode: cloud
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: research-project
    +swanlab_experiment_name: experiment-42
    +swanlab_workspace: my-research-team
    +swanlab_mode: cloud

    Example 4: Private Deployment

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: internal-project
    -swanlab_experiment_name: secure-training
    -swanlab_mode: cloud
    -swanlab_web_host: https://swanlab.yourcompany.com
    -swanlab_api_host: https://api.swanlab.yourcompany.com
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: internal-project
    +swanlab_experiment_name: secure-training
    +swanlab_mode: cloud
    +swanlab_web_host: https://swanlab.yourcompany.com
    +swanlab_api_host: https://api.swanlab.yourcompany.com

    Team Notifications with Lark (Feishu)

    @@ -1684,30 +1767,30 @@ By identifying the top n% of layers with the highest SNR, you can optimize train

    Example 5: Basic Lark Notifications

    Send training notifications to a Lark group chat:

    -
    plugins:
    -  - axolotl.integrations.swanlab.SwanLabPlugin
    -
    -use_swanlab: true
    -swanlab_project: production-training
    -swanlab_experiment_name: llama-3-finetune-v2
    -swanlab_mode: cloud
    -
    -swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxx
    +
    plugins:
    +  - axolotl.integrations.swanlab.SwanLabPlugin
    +
    +use_swanlab: true
    +swanlab_project: production-training
    +swanlab_experiment_name: llama-3-finetune-v2
    +swanlab_mode: cloud
    +
    +swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxx

    Note: This configuration will work, but you’ll see a security warning recommending HMAC secret configuration.