progress

tui
upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )
2025-08-22 02:43:16 -04:00 · 2025-08-22 05:08:02 +00:00 · 2025-08-21 15:04:10 -04:00 · 2025-08-20 22:14:13 -04:00 · 2025-08-20 15:17:48 -04:00 · 2025-08-20 08:52:26 -04:00
64 changed files with 13888 additions and 10322 deletions
--- a/.coderabbit.yaml
+++ b/.coderabbit.yaml
@@ -12,5 +12,6 @@ reviews:
  auto_review:
    enabled: true
    drafts: false
    auto_incremental_review: true
 chat:
  auto_reply: true
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -57,6 +57,13 @@ We welcome ideas for improvements and new features. To suggest an enhancement, o
 5. Push your branch to your fork on GitHub.
 6. Open a new pull request against the `main` branch of the axolotl repository. Include a clear and concise description of your changes, referencing any related issues.
 #### Skipping CI Checks
 You can skip certain CI checks by including specific keywords in your commit messages:
 - `[skip ci]` or `skip ci` - Skips all CI checks for that commit
 - `[skip-e2e]` or `skip-e2e` - Skips only end-to-end tests while running other CI checks. You may also include this in the title of your PR to disable end-to-end tests for the entire PR.
 ## Style Guidelines
 ### Code Style
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -188,13 +188,44 @@ jobs:
        run: |
          find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;
  gate-skip-e2e:
    needs: [pre-commit, pytest, pytest-sdist]
    runs-on: ubuntu-latest
    outputs:
      skip: ${{ steps.compute.outputs.skip }}
    steps:
      - uses: actions/github-script@v7
        id: compute
        with:
          script: |
            const token = /\[skip-e2e\]/i;
            let msg = '';
            if (context.eventName === 'push') {
              msg = context.payload.head_commit?.message || '';
            } else if (context.eventName === 'pull_request') {
              const { owner, repo } = context.repo;
              const prNumber = context.payload.pull_request.number;
              const commits = await github.paginate(
                github.rest.pulls.listCommits,
                { owner, repo, pull_number: prNumber, per_page: 100 }
              );
              msg = commits.at(-1)?.commit?.message || '';
            }
            const title = context.payload.pull_request?.title || '';
            const body  = context.payload.pull_request?.body  || '';
            const skip = token.test(msg) || token.test(title) || token.test(body);
            core.setOutput('skip', String(skip));
  docker-e2e-tests-1st:
    # Run this job first as a gate for running the remainder of the test matrix
-    if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' && !github.event.pull_request.draft }}
+    if: >
      github.repository_owner == 'axolotl-ai-cloud' &&
      (github.event_name != 'pull_request' || !github.event.pull_request.draft) &&
      needs.gate-skip-e2e.outputs.skip != 'true'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
-    needs: [pre-commit, pytest, pytest-sdist]
+    needs: [pre-commit, pytest, pytest-sdist, gate-skip-e2e]
    strategy:
      fail-fast: false
@@ -240,13 +271,16 @@ jobs:
          modal run cicd.e2e_tests
  docker-e2e-tests:
-    if: ${{ github.repository_owner == 'axolotl-ai-cloud' && !github.event.pull_request.draft }}
+    if: >
      github.repository_owner == 'axolotl-ai-cloud' &&
      (github.event_name != 'pull_request' || !github.event.pull_request.draft) &&
      needs.gate-skip-e2e.outputs.skip != 'true'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    # Only run the remainder of the matrix if the first e2e check passed;
    # this is to save on wasted compute costs for known failures that get caught in the first run
-    needs: [pre-commit, pytest, docker-e2e-tests-1st]
+    needs: [pre-commit, pytest, gate-skip-e2e, docker-e2e-tests-1st]
    strategy:
      fail-fast: false
--- a/TODO.md
+++ b/TODO.md
@@ -1,10 +0,0 @@
 # todo list
 - [] Validation of parameters for combinations that won't work
 ## things that are known not to work
 - FSDP offload and gradient_checkpointing - https://github.com/pytorch/pytorch/issues/82203
 - adamw_bnb_8bit doesn't play well with FSDP offload
--- a/docker/Dockerfile-base
+++ b/docker/Dockerfile-base
@@ -37,7 +37,7 @@ WORKDIR /workspace
 RUN python3 -m pip install --upgrade pip && pip3 install -U packaging==23.2 setuptools==75.8.0 wheel && \
    python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} torchvision --extra-index-url https://download.pytorch.org/whl/cu$CUDA && \
-    python3 -m pip install --no-cache-dir "causal_conv1d @ git+https://github.com/Dao-AILab/causal-conv1d.git@main" && \
+    CAUSAL_CONV1D_FORCE_CXX11_ABI=TRUE CAUSAL_CONV1D_FORCE_BUILD=TRUE python3 -m pip install --no-cache-dir causal_conv1d==1.5.2 && \
    python3 -m pip install --no-cache-dir "mamba_ssm @ git+https://github.com/state-spaces/mamba.git@main" && \
    python3 -m pip cache purge
--- a/docs/multimodal.qmd
+++ b/docs/multimodal.qmd
@@ -13,10 +13,13 @@ format:
 - [Pixtral](#sec-pixtral)
 - [Llava-1.5](#sec-llava-15)
 - [Mistral-Small-3.1](#sec-mistral-small-31)
 - [Voxtral](#sec-voxtral)
 - [Gemma-3](#sec-gemma-3)
 - [Gemma-3n](#sec-gemma-3n)
 - [Qwen2-VL](#sec-qwen2-vl)
 - [Qwen2.5-VL](#sec-qwen25-vl)
 - [SmolVLM2](#sec-smolvlm2)
 - [LFM2-VL](#sec-lfm2-vl)
 ## Usage
@@ -31,7 +34,7 @@ skip_prepare_dataset: true
 remove_unused_columns: false  # leave columns in place as they are needed to handle image embeddings during training
 sample_packing: false  # not yet supported with multimodal
-chat_template:  # see in next section
+chat_template:  # see in next section if specified
 # example dataset
 datasets:
@@ -97,6 +100,16 @@ base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
 chat_template: mistral_v7_tekken
 ```
 ### Voxtral {#sec-voxtral}
 ::: {.callout-tip}
 Please make sure to install audio lib via `pip3 install librosa==0.11.0 'mistral_common[audio]==1.8.3'`
 :::
 ```yaml
 base_model: mistralai/Voxtral-Mini-3B-2507
 ```
 ### Gemma-3 {#sec-gemma-3}
 ::: {.callout-tip}
@@ -143,6 +156,26 @@ base_model: Qwen/Qwen2.5-VL-7B-Instruct
 chat_template: qwen2_vl  # same as qwen2-vl
 ```
 ### SmolVLM2 {#sec-smolvlm2}
 ::: {.callout-tip}
 Please make sure to install `num2words` via `pip3 install num2words==0.5.14`
 :::
 ```yaml
 base_model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
 ```
 ### LFM2-VL {#sec-lfm2-vl}
 ::: {.callout-warning}
 Please uninstall `causal-conv1d` via `pip3 uninstall -y causal-conv1d`
 :::
 ```yaml
 base_model: LiquidAI/LFM2-VL-450M
 ```
 ## Dataset Format
 For multi-modal datasets, we adopt an extended `chat_template` format similar to OpenAI's Message format.
@@ -181,6 +214,20 @@ You may need to install `librosa` via `pip3 install librosa==0.11.0`.
 :::
 ### Video
 ::: {.callout-warning}
 This is not well tested at the moment. We welcome contributors!
 :::
 For video loading, you can use the following keys within `content` alongside `"type": "video"`:
 - `"path": "/path/to/video.mp4"`
 - `"url": "https://example.com/video.mp4"`
 - `"video": np.ndarray | list[PIL.Image.Image] | torch.Tensor` (or list of the aforementioned)
 ### Example
 Here is an example of a multi-modal dataset:
--- a/examples/LiquidAI/README.md
+++ b/examples/LiquidAI/README.md
@@ -0,0 +1,58 @@
 # Finetune Liquid Foundation Models 2 (LFM2) with Axolotl
 [Liquid Foundation Models 2 (LFM2)](https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38) are a family of small, open-weight models from [Liquid AI](https://www.liquid.ai/) focused on quality, speed, and memory efficiency. Liquid AI released text-only [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38) and text+vision [LFM2-VL](https://huggingface.co/collections/LiquidAI/lfm2-vl-68963bbc84a610f7638d5ffa) models.
 LFM2 features a new hybrid Liquid architecture with multiplicative gates, short-range convolutions, and grouped query attention, enabling fast training and inference.
 This guide shows how to fine-tune both the LFM2 and LFM2-VL models with Axolotl.
 ## Getting Started
 1.  Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
    Here is an example of how to install from pip:
    ```bash
    # Ensure you have a compatible version of Pytorch installed
    pip3 install packaging setuptools wheel ninja
    pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
    ```
 2.  Run one of the finetuning examples below.
    **LFM2**
    ```bash
    # FFT SFT (1x48GB @ 25GiB)
    axolotl train examples/LiquidAI/lfm2-350m-fft.yaml
    ```
    **LFM2-VL**
    ```bash
    # LoRA SFT (1x48GB @ 2.7GiB)
    axolotl train examples/LiquidAI/lfm2-vl-lora.yaml
    ```
 ### TIPS
 - **Installation Error**: If you encounter `ImportError: ... undefined symbol ...` or `ModuleNotFoundError: No module named 'causal_conv1d_cuda'`, the `causal-conv1d` package may have been installed incorrectly. Try uninstalling it:
  ```bash
  pip uninstall -y causal-conv1d
  ```
 - **Dataset Loading**: Read more on how to load your own dataset in our [documentation](https://docs.axolotl.ai/docs/dataset_loading.html).
 - **Dataset Formats**:
  - For LFM2 models, the dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
  - For LFM2-VL models, Axolotl follows the multi-content Messages format. See our [Multimodal docs](https://docs.axolotl.ai/docs/multimodal.html#dataset-format) for details.
 ## Optimization Guides
 - [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
 - [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
 - [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
 ## Related Resources
 - [LFM2 Blog](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models)
 - [LFM2-VL Blog](https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models)
 - [Axolotl Docs](https://docs.axolotl.ai)
 - [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
 - [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
--- a/examples/LiquidAI/lfm2-350m-fft.yaml
+++ b/examples/LiquidAI/lfm2-350m-fft.yaml
@@ -2,7 +2,6 @@ base_model: LiquidAI/LFM2-350M
 chunked_cross_entropy: true
 chat_template: tokenizer_default
 eot_tokens:
  - "<|im_end|>"
 datasets:
--- a/examples/LiquidAI/lfm2-vl-lora.yaml
+++ b/examples/LiquidAI/lfm2-vl-lora.yaml
@@ -0,0 +1,58 @@
 base_model: LiquidAI/LFM2-VL-450M
 trust_remote_code: true
 model_type: AutoModelForImageTextToText
 processor_type: AutoProcessor
 # these 3 lines are needed for now to handle vision chat templates w images
 skip_prepare_dataset: true
 remove_unused_columns: false
 sample_packing: false
 datasets:
  - path: HuggingFaceH4/llava-instruct-mix-vsft
    type: chat_template
    split: train[:1%]
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.0
 output_dir: ./outputs/out
 adapter: lora
 lora_model_dir:
 sequence_len: 8192
 pad_to_sequence_len: false
 lora_r: 32
 lora_alpha: 16
 lora_dropout: 0.05
 lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
 wandb_project:
 wandb_entity:
 wandb_watch:
 wandb_name:
 wandb_log_model:
 gradient_accumulation_steps: 4
 micro_batch_size: 1
 num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 bf16: true
 fp16:
 tf32: true
 gradient_checkpointing: true
 logging_steps: 1
 flash_attention: true
 eager_attention:
 warmup_ratio: 0.1
 evals_per_epoch: 1
 saves_per_epoch: 1
 weight_decay: 0.0
 # save_first_step: true  # uncomment this to validate checkpoint saving works with your config
--- a/examples/colab-notebooks/colab-axolotl-example.ipynb
+++ b/examples/colab-notebooks/colab-axolotl-example.ipynb
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -33,13 +33,64 @@ Note: Memory usage taken from `device_mem_reserved(gib)` from logs.
 ### Training 120B
-On 8xH100s
+On 8xH100s, make sure you have ~3TB of free disk space. With each checkpoint clocking in at ~720GB, along with the base
 model, and final model output, you may need at least 3TB of free disk space to keep at least 2 checkpoints.
 ```bash
 # FFT SFT with offloading (8x80GB @ ~49GiB/GPU)
 axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
 ```
 To simplify fine-tuning across 2 nodes × 8x H100 (80GB) GPUs, we've partnered with [Baseten](https://baseten.co) to showcase multi-node
 training of the 120B model using Baseten Truss. You can read more about this recipe on
 [Baseten's blog](https://www.baseten.co/blog/how-to-fine-tune-gpt-oss-120b-with-baseten-and-axolotl/). The recipe can
 be found on their
 [GitHub](https://github.com/basetenlabs/ml-cookbook/tree/main/examples/oss-gpt-120b-axolotl/training).
 ERRATA: Transformers saves the model Architecture prefixed with `FSDP` which needs to be manually renamed in `config.json`.
 See https://github.com/huggingface/transformers/pull/40207 for the status of this issue.
 ```bash
 sed -i 's/FSDPGptOssForCausalLM/GptOssForCausalLM/g' ./outputs/gpt-oss-out/config.json
 ```
 When using SHARDED_STATE_DICT with FSDP, the final checkpoint should automatically merge the sharded weights to your
 configured `output_dir`. However, if that step fails due to a disk space error, you can take an additional step to
 merge the sharded weights.  This step will automatically determine the last checkpoint directory and merge the sharded
 weights to `{output_dir}/merged`.
 ```bash
 axolotl merge-sharded-fsdp-weights examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
 mv ./outputs/gpt-oss-out/merged/* ./outputs/gpt-oss-out/
 ```
 ### Inferencing your fine-tuned model
 #### vLLM
 GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425
 for more information about using a special vllm-openai docker image for inferencing with vLLM.
 Optionally, vLLM can be installed from nightly:
 ```bash
 pip install --no-build-isolation --pre -U vllm --extra-index-url https://wheels.vllm.ai/nightly
 ```
 and the vLLM server can be started with the following command (modify `--tensor-parallel-size 8` to match your environment):
 ```bash
 vllm serve ./outputs/gpt-oss-out/ --served-model-name axolotl/gpt-oss-20b --host 0.0.0.0 --port 8888  --tensor-parallel-size 8
 ```
 #### SGLang
 SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing
 SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:
 ```bash
 python3 -m sglang.launch_server --model ./outputs/gpt-oss-out/ --served-model-name axolotl/gpt-oss-120b --host 0.0.0.0 --port 8888 --tp 8
 ```
 ### Tool use
 GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.
--- a/examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
+++ b/examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
@@ -20,6 +20,7 @@ datasets:
 dataset_prepared_path: last_run_prepared
 val_set_size: 0
 output_dir: ./outputs/gpt-oss-out/
 save_total_limit: 2  # the 120B model can use up to 720GB of disk space per checkpoint, so let's only keep the last 2
 sequence_len: 4096
 sample_packing: true
@@ -43,7 +44,7 @@ bf16: true
 tf32: true
 flash_attention: true
-attn_implementation: kernels-community/vllm-flash-attn3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
 gradient_checkpointing: true
 activation_offloading: true
--- a/examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml
+++ b/examples/gpt-oss/gpt-oss-20b-fft-deepspeed-zero3.yaml
@@ -40,7 +40,7 @@ bf16: true
 tf32: true
 flash_attention: true
-attn_implementation: kernels-community/vllm-flash-attn3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
 gradient_checkpointing: true
 activation_offloading: true
--- a/examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
+++ b/examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
@@ -15,7 +15,7 @@ datasets:
    field_thinking: thinking
    template_thinking_key: thinking
-dataset_prepared_path: last_run_prepared
+dataset_prepared_path: ./outputs/last_run_prepared
 val_set_size: 0
 output_dir: ./outputs/gpt-oss-out/
@@ -41,7 +41,7 @@ bf16: true
 tf32: true
 flash_attention: true
-attn_implementation: kernels-community/vllm-flash-attn3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
 gradient_checkpointing: true
 activation_offloading: true
--- a/examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
+++ b/examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
@@ -15,7 +15,7 @@ datasets:
    field_thinking: thinking
    template_thinking_key: thinking
-dataset_prepared_path: last_run_prepared
+dataset_prepared_path: ./outputs/last_run_prepared
 val_set_size: 0
 output_dir: ./outputs/gpt-oss-out/
@@ -40,7 +40,7 @@ bf16: true
 tf32: true
 flash_attention: true
-attn_implementation: kernels-community/vllm-flash-attn3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
 gradient_checkpointing: true
 activation_offloading: true
--- a/examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
+++ b/examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
@@ -53,7 +53,7 @@ bf16: true
 tf32: true
 flash_attention: true
-attn_implementation: kernels-community/vllm-flash-attn3
+attn_implementation: kernels-community/vllm-flash-attn3  # this is not needed if using flash_attn >= 2.8.3
 gradient_checkpointing: true
 activation_offloading: true
--- a/examples/lfm2/README.md
+++ b/examples/lfm2/README.md
@@ -1,7 +0,0 @@
 # Liquid Foundation Models 2
 LFM2 support in transformers exists in the main branch, but is not yet included in the transformers release.
 ```bash
 pip install --upgrade --no-deps --force-reinstall git+https://github.com/huggingface/transformers.git
 ```
--- a/examples/smolvlm2/README.md
+++ b/examples/smolvlm2/README.md
@@ -0,0 +1,49 @@
 # Finetune SmolVLM2 with Axolotl
 [SmolVLM2](https://huggingface.co/collections/HuggingFaceTB/smolvlm2-smallest-video-lm-ever-67ab6b5e84bf8aaa60cb17c7) are a family of lightweight, open-source multimodal models from HuggingFace designed to analyze and understand video, image, and text content.
 These models are built for efficiency, making them well-suited for on-device applications where computational resources are limited. Models are available in multiple sizes, including 2.2B, 500M, and 256M.
 This guide shows how to fine-tune SmolVLM2 models with Axolotl.
 ## Getting Started
 1.  Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
    Here is an example of how to install from pip:
    ```bash
    # Ensure you have a compatible version of Pytorch installed
    pip3 install packaging setuptools wheel ninja
    pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
    ```
 2. Install an extra dependency:
    ```bash
    pip3 install num2words==0.5.14
    ```
 3.  Run the finetuning example:
    ```bash
    # LoRA SFT (1x48GB @ 6.8GiB)
    axolotl train examples/smolvlm2/smolvlm2-2B-lora.yaml
    ```
 ## TIPS
 - **Dataset Format**: For video finetuning, your dataset must be compatible with the multi-content Messages format. For more details, see our documentation on [Multimodal Formats](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
 - **Dataset Loading**: Read more on how to prepare and load your own datasets in our [documentation](https://docs.axolotl.ai/docs/dataset_loading.html).
 ## Optimization Guides
 - [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
 - [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
 - [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
 ## Related Resources
 - [SmolVLM2 Blog](https://huggingface.co/blog/smolvlm2)
 - [Axolotl Docs](https://docs.axolotl.ai)
 - [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
 - [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
--- a/examples/smolvlm2/smolvlm2-2B-lora.yaml
+++ b/examples/smolvlm2/smolvlm2-2B-lora.yaml
@@ -0,0 +1,56 @@
 base_model: HuggingFaceTB/SmolVLM2-2.2B-Instruct
 trust_remote_code: true
 processor_type: AutoProcessor
 # these 3 lines are needed for now to handle vision chat templates w images
 skip_prepare_dataset: true
 remove_unused_columns: false
 sample_packing: false
 datasets:
  - path: HuggingFaceH4/llava-instruct-mix-vsft
    type: chat_template
    split: train[:1%]
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.0
 output_dir: ./outputs/out
 adapter: lora
 lora_model_dir:
 sequence_len: 8192
 pad_to_sequence_len: false
 lora_r: 32
 lora_alpha: 16
 lora_dropout: 0.05
 lora_target_modules: 'model.text_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
 wandb_project:
 wandb_entity:
 wandb_watch:
 wandb_name:
 wandb_log_model:
 gradient_accumulation_steps: 4
 micro_batch_size: 1
 num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 bf16: true
 fp16:
 tf32: true
 gradient_checkpointing: true
 logging_steps: 1
 flash_attention: true
 eager_attention:
 warmup_ratio: 0.1
 evals_per_epoch: 1
 saves_per_epoch: 1
 weight_decay: 0.0
 # save_first_step: true  # uncomment this to validate checkpoint saving works with your config
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
 # START section of dependencies that don't install on Darwin/MacOS
-bitsandbytes==0.46.1
+bitsandbytes==0.47.0
 # triton 3.4.0 is not compatible with CCE
 triton>=3.0.0,<3.4.0
 mamba-ssm==1.2.0.post1
@@ -14,7 +14,7 @@ packaging==23.2
 huggingface_hub>=0.33.0
 peft==0.17.0
-transformers==4.55.0
+transformers==4.55.2
 tokenizers>=0.21.1
 accelerate==1.10.0
 datasets==4.0.0
@@ -72,3 +72,8 @@ axolotl-contribs-lgpl==0.0.6
 axolotl-contribs-mit==0.0.5
 mistral-common==1.8.3
 # TUI dependencies
 textual==1.0.0
 rich==14.1.0
 tree_sitter_ruby==0.23.1
--- a/setup.py
+++ b/setup.py
@@ -118,9 +118,9 @@ def get_package_version():
 extras_require = {
-    "flash-attn": ["flash-attn==2.8.2"],
+    "flash-attn": ["flash-attn==2.8.3"],
    "ring-flash-attn": [
-        "flash-attn==2.8.2",
+        "flash-attn==2.8.3",
        "ring-flash-attn>=0.1.7",
        "yunchang==0.6.0",
    ],
--- a/src/axolotl/cli/args.py
+++ b/src/axolotl/cli/args.py
@@ -40,6 +40,12 @@ class VllmServeCliArgs:
        default=None,
        metadata={"help": "Number of tensor parallel workers to use."},
    )
    data_parallel_size: Optional[int] = field(
        default=None,
        metadata={
            "help": "Number of data parallel workers to use for vLLM serving. This controls how many model replicas are used for parallel inference."
        },
    )
    host: Optional[str] = field(
        default=None,  # nosec B104
        metadata={"help": "Host address to run the server on."},
--- a/src/axolotl/cli/cloud/modal_.py
+++ b/src/axolotl/cli/cloud/modal_.py
@@ -82,7 +82,7 @@ class ModalCloud(Cloud):
        return res
    def get_image(self):
-        docker_tag = "main-py3.11-cu124-2.6.0"
+        docker_tag = "main-py3.11-cu126-2.7.1"
        if self.config.docker_tag:
            docker_tag = self.config.docker_tag
        docker_image = f"axolotlai/axolotl:{docker_tag}"
@@ -200,7 +200,7 @@ class ModalCloud(Cloud):
        if family in ["a10", "a10g"]:
            return modal.gpu.A10G(count=count)
        if family == "h100":
-            return modal.gpu.H100(count=count)
+            return f"H100:{count}"
        if family == "t4":
            return modal.gpu.T4(count=count)
        if family == "l4":
--- a/src/axolotl/cli/inference.py
+++ b/src/axolotl/cli/inference.py
@@ -64,7 +64,7 @@ def do_inference(
            importlib.import_module("axolotl.prompters"), prompter
        )
    elif cfg.chat_template:
-        chat_template_str = get_chat_template(cfg.chat_template)
+        chat_template_str = get_chat_template(cfg.chat_template, tokenizer=tokenizer)
    elif cfg.datasets[0].type == "chat_template":
        chat_template_str = get_chat_template_from_config(
            cfg=cfg, ds_cfg=cfg.datasets[0], tokenizer=tokenizer
--- a/src/axolotl/cli/main.py
+++ b/src/axolotl/cli/main.py
@@ -344,6 +344,26 @@ def delinearize_llama4(model: str, output: str):
 cli.add_command(lm_eval)
@cli.command()
 def tui():
    """
    Launch the Axolotl Terminal User Interface (TUI).
    Provides an interactive interface for configuration management,
    training monitoring, dataset handling, and model operations.
    """
    try:
        from axolotl.tui.app import run
        run()
    except ImportError:
        click.echo(
            "TUI dependencies not installed. Install with: pip install textual rich"
        )
    except Exception as e:
        click.echo(f"Error launching TUI: {e}")
 def main():
    cli()
--- a/src/axolotl/cli/merge_sharded_fsdp_weights.py
+++ b/src/axolotl/cli/merge_sharded_fsdp_weights.py
@@ -10,6 +10,7 @@ import fire
 import torch
 import torch.distributed.checkpoint as dist_cp
 import torch.distributed.checkpoint.format_utils as dist_cp_format_utils
 from accelerate import PartialState
 from accelerate.utils import (
    SAFE_WEIGHTS_INDEX_NAME,
    SAFE_WEIGHTS_NAME,
@@ -23,6 +24,7 @@ from torch.distributed.checkpoint.format_utils import _EmptyStateDictLoadPlanner
 from axolotl.cli.config import load_cfg
 from axolotl.utils.logging import get_logger
 from axolotl.utils.train import determine_last_checkpoint
 LOG = get_logger(__name__)
@@ -143,7 +145,6 @@ def merge_fsdp_weights(
        ValueError: If torch version < 2.3.0, or if `checkpoint_dir` does not exist.
    """
    checkpoint_dir_ = Path(checkpoint_dir)
    from accelerate.state import PartialState
    if not is_torch_version(">=", "2.3.0"):
        raise ValueError("`merge_fsdp_weights` requires PyTorch >= 2.3.0`")
@@ -180,7 +181,6 @@ def merge_fsdp_weights(
        if remove_checkpoint_dir:
            LOG.info(f"Removing old checkpoint directory {checkpoint_dir_}")
            shutil.rmtree(checkpoint_dir_)
    state.wait_for_everyone()
 def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs):
@@ -195,11 +195,32 @@ def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs):
    parsed_cfg = load_cfg(config, **kwargs)
    fsdp_dir = Path(parsed_cfg.output_dir) / "pytorch_model_fsdp_0"
    if not fsdp_dir.exists():
        checkpoint_dir = determine_last_checkpoint(parsed_cfg, update=False)
        if checkpoint_dir:
            fsdp_dir = Path(checkpoint_dir) / "pytorch_model_fsdp_0"
        if not fsdp_dir.exists():
            raise ValueError(
                f"Could not find FSDP checkpoint `pytorch_model_fsdp_0` in {checkpoint_dir}"
            )
    output_path = str(Path(parsed_cfg.output_dir) / "merged")
    merge_fsdp_weights(
        checkpoint_dir=str(fsdp_dir),
-        output_path=str(Path(parsed_cfg.output_dir) / "merged"),
+        output_path=output_path,
        safe_serialization=True,
    )
    state = PartialState()
    state.wait_for_everyone()
    LOG.info(
        f"FSDP SHARDED_STATE_DICT weights successfully merged to: {output_path}",
        main_process_only=True,
    )
    LOG.info(
        "Merged weights are only the safetensors and doesn't include the model configuration "
        f"or tokenizer which may be found in {parsed_cfg.output_dir}.",
        main_process_only=True,
    )
 if __name__ == "__main__":
--- a/src/axolotl/cli/preprocess.py
+++ b/src/axolotl/cli/preprocess.py
@@ -97,7 +97,8 @@ def do_cli(
    """
    # pylint: disable=duplicate-code
    os.environ["AXOLOTL_IS_PREPROCESS"] = "1"
-    parsed_cfg = load_cfg(config, **kwargs)
+    is_preprocess = kwargs.pop("is_preprocess", True)
    parsed_cfg = load_cfg(config, is_preprocess=is_preprocess, **kwargs)
    parsed_cfg.is_preprocess = True
    parser = transformers.HfArgumentParser(PreprocessCliArgs)
    parsed_cli_args, _ = parser.parse_args_into_dataclasses(
--- a/src/axolotl/cli/utils/sweeps.py
+++ b/src/axolotl/cli/utils/sweeps.py
@@ -3,11 +3,12 @@
 import random
 from copy import deepcopy
 from itertools import product
 from typing import Any
 def generate_sweep_configs(
    base_config: dict[str, list], sweeps_config: dict[str, list]
-) -> list[dict[str, list]]:
+) -> list[dict[str, Any]]:
    """
    Recursively generates all possible configurations by applying sweeps to the base config.
--- a/src/axolotl/cli/utils/train.py
+++ b/src/axolotl/cli/utils/train.py
@@ -4,6 +4,7 @@ import os
 import subprocess  # nosec
 import sys
 import tempfile
 from pathlib import Path
 from typing import Any, Iterator, Literal
 import yaml
@@ -67,14 +68,12 @@ def build_command(base_cmd: list[str], options: dict[str, Any]) -> list[str]:
 def generate_config_files(config: str, sweep: str | None) -> Iterator[tuple[str, bool]]:
    """
-    Generate list of configuration files to process.
+    Generate list of configuration files to process. Yields a tuple of the configuration file name and a boolean indicating
    whether this is a group of configurations (i.e., a sweep).
    Args:
        config: Base configuration file
        sweep: Sweep configuration file
    Yields:
        Tuple of configuration file name and whether this is a group of configurations
    """
    if not sweep:
@@ -90,7 +89,12 @@ def generate_config_files(config: str, sweep: str | None) -> Iterator[tuple[str,
    # Generate all possible configurations
    permutations = generate_sweep_configs(base_config, sweep_config)
    is_group = len(permutations) > 1
-    for permutation in permutations:
+    base_output_dir = base_config.get("output_dir", "./model-out")
    for idx, permutation in enumerate(permutations, start=1):
        permutation_dir = Path(permutation.get("output_dir", base_output_dir))
        permutation_id = f"sweep{idx:04d}"
        permutation["output_dir"] = str(permutation_dir / permutation_id)
        # pylint: disable=consider-using-with
        temp_file = tempfile.NamedTemporaryFile(
            mode="w",
--- a/src/axolotl/core/trainers/init.py
+++ b/src/axolotl/core/trainers/init.py
@@ -5,7 +5,6 @@
 from .base import AxolotlTrainer
 from .dpo.trainer import AxolotlDPOTrainer
 from .grpo.trainer import AxolotlGRPOSequenceParallelTrainer, AxolotlGRPOTrainer
 from .mamba import AxolotlMambaTrainer
 from .trl import (
    AxolotlCPOTrainer,
--- a/src/axolotl/loaders/constants.py
+++ b/src/axolotl/loaders/constants.py
@@ -1,26 +1,13 @@
 """Shared constants for axolotl.loaders module"""
-from transformers import (
+from transformers import AutoModelForImageTextToText
-    Gemma3ForConditionalGeneration,
+from transformers.models.auto.modeling_auto import (
-    Gemma3nForConditionalGeneration,
+    MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES,
    Llama4ForConditionalGeneration,
    LlavaForConditionalGeneration,
    Mistral3ForConditionalGeneration,
    MllamaForConditionalGeneration,
    Qwen2_5_VLForConditionalGeneration,
    Qwen2VLForConditionalGeneration,
 )
-MULTIMODAL_AUTO_MODEL_MAPPING = {
+MULTIMODAL_AUTO_MODEL_MAPPING = dict(MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES)
-    "mllama": MllamaForConditionalGeneration,
+
-    "llama4": Llama4ForConditionalGeneration,
+MULTIMODAL_AUTO_MODEL_MAPPING["lfm2-vl"] = AutoModelForImageTextToText
    "llava": LlavaForConditionalGeneration,
    "qwen2_vl": Qwen2VLForConditionalGeneration,
    "qwen2_5_vl": Qwen2_5_VLForConditionalGeneration,
    "mistral3": Mistral3ForConditionalGeneration,
    "gemma3": Gemma3ForConditionalGeneration,
    "gemma3n": Gemma3nForConditionalGeneration,
 }
 try:
    from transformers import VoxtralForConditionalGeneration
--- a/src/axolotl/loaders/model.py
+++ b/src/axolotl/loaders/model.py
@@ -25,6 +25,7 @@ from peft import (
 from torch.distributed import DeviceMesh
 from transformers import (
    AutoModelForCausalLM,
    AutoModelForImageTextToText,
    AutoModelForVision2Seq,
    AwqConfig,
    BitsAndBytesConfig,
@@ -212,6 +213,7 @@ class ModelLoader:
            self.model_kwargs["use_kernels"] = self.cfg.use_kernels
        self._set_quantization_config()
        self._set_attention_config()
        self._check_model_requirements()
    def _apply_post_model_load_setup(self):
        """Configure the model after it has been loaded."""
@@ -432,6 +434,8 @@ class ModelLoader:
            self.auto_model_loader = MULTIMODAL_AUTO_MODEL_MAPPING.get(
                self.model_config.model_type, AutoModelForVision2Seq
            )
            if isinstance(self.auto_model_loader, str):
                self.auto_model_loader = AutoModelForImageTextToText
    def _set_device_map_config(self):
        """Setup `device_map` according to config"""
@@ -628,6 +632,16 @@ class ModelLoader:
        if self.cfg.low_cpu_mem_usage:
            self.model_kwargs["low_cpu_mem_usage"] = True
    def _check_model_requirements(self):
        if self.cfg.model_config_type in ["lfm2-vl", "lfm2"]:
            from transformers.utils.import_utils import is_causal_conv1d_available
            if is_causal_conv1d_available():
                raise ImportError(
                    "The 'causal-conv1d' package is installed but causes compatibility issues with LFM2 models. "
                    "Please uninstall it by running: `pip uninstall -y causal-conv1d`"
                )
    def _configure_zero3_memory_efficient_loading(
        self,
    ) -> HfTrainerDeepSpeedConfig | None:
--- a/src/axolotl/loaders/patch_manager.py
+++ b/src/axolotl/loaders/patch_manager.py
@@ -285,12 +285,10 @@ class PatchManager:
            and self.cfg.adapter == "qlora"
        ):
            from axolotl.monkeypatch.fsdp2_qlora import (
                apply_bnb_torch_function_patch,
                apply_init_sharded_param_patch,
                apply_init_unsharded_param_patch,
            )
            apply_bnb_torch_function_patch()
            apply_init_sharded_param_patch()
            apply_init_unsharded_param_patch()
--- a/src/axolotl/monkeypatch/accelerate/fsdp2.py
+++ b/src/axolotl/monkeypatch/accelerate/fsdp2.py
@@ -187,7 +187,7 @@ def _process_lora_module_for_fsdp(module, fsdp2_kwargs):
    # Linear4Bit will keep it's bias term in fp32. If the weight dtype is in bf16 we are not able to
    # wrap this. Therefore we must ensure the bias has the same dtype as the weight
-    if module.base_layer.bias is not None:
+    if hasattr(module.base_layer, "bias") and module.base_layer.bias is not None:
        if module.base_layer.weight.dtype != module.base_layer.bias.dtype:
            log_bias_dtype_mismatch = True
            module.base_layer.bias.data = module.base_layer.bias.data.to(
--- a/src/axolotl/monkeypatch/fsdp2_qlora.py
+++ b/src/axolotl/monkeypatch/fsdp2_qlora.py
@@ -9,73 +9,12 @@ Params4bit parameters.
 import importlib
 import inspect
 import torch
 from torch.nn import Parameter
 from axolotl.monkeypatch.utils import detab_code
 from axolotl.utils.logging import get_logger
 LOG = get_logger(__name__)
 def patched_torch_function(cls, func, types, args=(), kwargs=None):
    """
    Patched version of Params4bit.__torch_function__ for preserving Params4bit
    class identity and attributes.
    """
    if kwargs is None:
        kwargs = {}
    if func in [torch.chunk, torch.split]:
        tensor = args[0]
        result = Parameter.__torch_function__(func, types, args, kwargs)
        if isinstance(result, tuple):
            return tuple(
                cls(
                    data=chunk,
                    requires_grad=tensor.requires_grad,
                    quant_state=tensor.quant_state,
                    blocksize=tensor.blocksize,
                    compress_statistics=tensor.compress_statistics,
                    quant_type=tensor.quant_type,
                    quant_storage=tensor.quant_storage,
                    module=tensor.module,
                    bnb_quantized=tensor.bnb_quantized,
                )
                for chunk in result
            )
        return cls(
            data=result,
            requires_grad=tensor.requires_grad,
            quant_state=tensor.quant_state,
            blocksize=tensor.blocksize,
            compress_statistics=tensor.compress_statistics,
            quant_type=tensor.quant_type,
            quant_storage=tensor.quant_storage,
            module=tensor.module,
            bnb_quantized=tensor.bnb_quantized,
        )
    return Parameter.__torch_function__(func, types, args, kwargs)
 # pylint: disable=protected-access
 def apply_bnb_torch_function_patch():
    """
    Patch Params4bit.__torch_function__ using Axolotl-style approach.
    Returns:
        True if patching succeeded, False otherwise.
    """
    from bitsandbytes.nn.modules import Params4bit
    Params4bit.__torch_function__ = classmethod(patched_torch_function)
    LOG.info("Successfully patched Params4bit.__torch_function__")
 # pylint: disable=protected-access
 def apply_init_sharded_param_patch():
    """Apply patch to FSDPParam._init_sharded_param to support Params4bit."""
--- a/src/axolotl/monkeypatch/ring_attn/adapters/batch.py
+++ b/src/axolotl/monkeypatch/ring_attn/adapters/batch.py
@@ -20,12 +20,15 @@ from ring_flash_attn import ring_flash_attn_func
 from ring_flash_attn.adapters.hf_adapter import check_params
 from transformers.modeling_flash_attention_utils import is_flash_attn_greater_or_equal
-try:
+try:  # pylint: disable=duplicate-code
    from transformers.modeling_flash_attention_utils import _flash_supports_window
 except ImportError:
-    from transformers.modeling_flash_attention_utils import (
+    try:
-        _flash_supports_window_size as _flash_supports_window,
+        from transformers.modeling_flash_attention_utils import (
-    )
+            _flash_supports_window_size as _flash_supports_window,
        )
    except ImportError:
        _flash_supports_window = True
 from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS
--- a/src/axolotl/monkeypatch/ring_attn/patch.py
+++ b/src/axolotl/monkeypatch/ring_attn/patch.py
@@ -15,12 +15,15 @@ import torch
 import torch.distributed as dist
 from torch.distributed import DeviceMesh
-try:
+try:  # pylint: disable=duplicate-code
    from transformers.modeling_flash_attention_utils import _flash_supports_window
 except ImportError:
-    from transformers.modeling_flash_attention_utils import (
+    try:
-        _flash_supports_window_size as _flash_supports_window,
+        from transformers.modeling_flash_attention_utils import (
-    )
+            _flash_supports_window_size as _flash_supports_window,
        )
    except ImportError:
        _flash_supports_window = True
 from axolotl.monkeypatch.utils import get_cu_seqlens_from_pos_ids
 from axolotl.utils.logging import get_logger
--- a/src/axolotl/processing_strategies.py
+++ b/src/axolotl/processing_strategies.py
@@ -6,7 +6,7 @@ from typing import Optional
 from PIL import Image, ImageOps
 from PIL.Image import Resampling
 from torch import Tensor, zeros_like
-from transformers import ProcessorMixin, VoxtralProcessor
+from transformers import ProcessorMixin, SmolVLMProcessor, VoxtralProcessor
 from transformers.image_utils import load_image
 from axolotl.utils.dict import remove_none_values
@@ -138,7 +138,7 @@ class ProcessingStrategy:
                    image_key = key
                    break
-            # if the image key exists, add the image to the first message
+            # if the image key exists, add the image to the first user message
            if image_key is not None and processed_example[image_key] is not None:
                # TODO: check if it's normal to be single image only for common datasets
                # From observation, it's usually a list of single image but some datasets may have several columns for images
@@ -179,26 +179,34 @@ class ProcessingStrategy:
                # Look for any image type in the first message
                # some dataset have an {type: "image"} in the first message
                msg_ind_to_add = None
                ind_to_add = None
                first_user_idx = None
-                for i, content in enumerate(
+                for msg_idx, msg_content in enumerate(processed_example["messages"]):
-                    processed_example["messages"][0]["content"]
+                    if first_user_idx is None and msg_content["role"] == "user":
-                ):
+                        first_user_idx = msg_idx
-                    # Usually datasets created with image columns, don't have it in the messages itself
+                    for i, content in enumerate(
-                    if content["type"] == "image" and all(
+                        processed_example["messages"][msg_idx]["content"]
                        k not in content for k in ["image", "url", "path", "base64"]
                    ):
-                        ind_to_add = i
+                        # Usually datasets created with image columns, don't have it in the messages itself
-                        break
+                        if content["type"] == "image" and all(
                            k not in content for k in ["image", "url", "path", "base64"]
                        ):
                            msg_ind_to_add = msg_idx
                            ind_to_add = i
                            break
                # If an image type is found, add the image to that index
-                if ind_to_add is not None:
+                if ind_to_add is not None and msg_ind_to_add is not None:
-                    processed_example["messages"][0]["content"][ind_to_add][
+                    processed_example["messages"][msg_ind_to_add]["content"][
-                        "image"
+                        ind_to_add
-                    ] = image_value
+                    ]["image"] = image_value
                else:
-                    # if no image type is found, add it to end of the first message
+                    # if no image type is found, add it to end of the first user message
-                    processed_example["messages"][0]["content"].append(
+                    if first_user_idx is None:
                        first_user_idx = 0
                    processed_example["messages"][first_user_idx]["content"].append(
                        {
                            "type": "image",
                            "image": image_value,
@@ -395,6 +403,24 @@ class VoxtralProcessingStrategy(ProcessingStrategy):
        return labels
 class SmolVLM2ProcessingStrategy(ProcessingStrategy):
    """Processing Strategy class for SmolVLM2"""
    def __init__(
        self,
        processor: ProcessorMixin,
        chat_template: Optional[str] = None,
        image_size: int | tuple[int, int] | None = None,
        image_resize_algorithm: Resampling | None = None,
    ):
        super().__init__(processor, chat_template, image_size, image_resize_algorithm)
        self.image_token = "<image>"  # nosec
        self.image_token_id = processor.tokenizer.additional_special_tokens_ids[
            processor.tokenizer.additional_special_tokens.index(self.image_token)
        ]
 def get_processing_strategy(
    processor: ProcessorMixin,
    chat_template,
@@ -402,32 +428,43 @@ def get_processing_strategy(
    image_size: int | tuple[int, int] | None = None,
    image_resize_algorithm: Resampling | None = None,
 ):
    processing_kwargs = {
        "processor": processor,
        "chat_template": chat_template,
        "image_size": image_size,
        "image_resize_algorithm": image_resize_algorithm,
    }
    if chat_template_type in [None, "tokenizer_default"] and hasattr(
        processor.tokenizer, "chat_template"
    ):
        processing_kwargs["chat_template"] = processor.tokenizer.chat_template
    if chat_template_type == "qwen2_vl":
        return Qwen2VLProcessingStrategy(
-            processor, chat_template, image_size, image_resize_algorithm
+            **processing_kwargs,
        )
    if chat_template_type == "gemma3":
        return Gemma3ProcessingStrategy(
-            processor, chat_template, image_size, image_resize_algorithm
+            **processing_kwargs,
        )
    if chat_template_type == "gemma3n":
        return Gemma3nProcessingStrategy(
-            processor, chat_template, image_size, image_resize_algorithm
+            **processing_kwargs,
        )
    if chat_template_type in [
        "llama3_2_vision",
        "llama4",
        "llava",
        "mistral_v7_tekken",
        "pixtral",
    ]:
        return ProcessingStrategy(
            processor, chat_template, image_size, image_resize_algorithm
        )
    if isinstance(processor, VoxtralProcessor):
        return VoxtralProcessingStrategy(
-            processor, chat_template, image_size, image_resize_algorithm
+            **processing_kwargs,
        )
-    raise ValueError(f"Unsupported chat template type: {chat_template_type}")
+    if isinstance(processor, SmolVLMProcessor):
        return SmolVLM2ProcessingStrategy(
            **processing_kwargs,
        )
    # llama3_2_vision, llama4, llava
    # mistral_v7_tekken, pixtral, lfm2vl
    return ProcessingStrategy(
        **processing_kwargs,
    )
--- a/src/axolotl/prompt_strategies/chat_template.py
+++ b/src/axolotl/prompt_strategies/chat_template.py
@@ -129,13 +129,21 @@ class ChatTemplatePrompter(Prompter):
                images=images,
                return_tensors="pt",
            )
            if hasattr(batch, "to_dict"):
                batch = batch.to_dict()
            else:
                batch = dict(batch)
            # workaround since processor works in batches instead of single examples
            out = {}
            for k, val in batch.items():
-                if k in ["pixel_values"]:
+                if hasattr(val, "tolist"):
-                    batch[k] = val.tolist()
+                    out[k] = (
                        val.tolist() if k == "pixel_values" else val.squeeze(0).tolist()
                    )
                else:
-                    batch[k] = val.squeeze().tolist()
+                    out[k] = val
-            return batch
+            return out
        return self.tokenizer.apply_chat_template(
            conversation,
@@ -433,10 +441,13 @@ class ChatTemplateStrategy(PromptTokenizingStrategy):
                tokenized_prompt["attention_mask"] = [1] * len(input_ids)
            else:
                input_ids = tokenized_res["input_ids"]
-                tokenized_prompt = tokenized_res
+                tokenized_prompt = dict(tokenized_res)
            if not self.train_on_inputs:
-                user_prompt_len = len(prompt_ids)
+                if isinstance(prompt_ids, dict):
                    user_prompt_len = len(prompt_ids["input_ids"])
                else:
                    user_prompt_len = len(prompt_ids)
                labels = [-100] * user_prompt_len + input_ids[user_prompt_len:]
            else:
                labels = input_ids
--- a/src/axolotl/prompt_strategies/messages/chat.py
+++ b/src/axolotl/prompt_strategies/messages/chat.py
@@ -72,9 +72,10 @@ def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
        builder_kwargs["message_field_training"] = message_field_training
    chat_template = ds_cfg.get("chat_template", cfg.get("chat_template", "chatml"))
-    format_message = (
+
-        lambda x: x  # noqa E731  # pylint: disable=unnecessary-lambda-assignment
+    def format_message(x):
-    )
+        return x
    if chat_template == "chatml":
        from axolotl.core.chat.format.chatml import format_message  # noqa F811
    if chat_template.startswith("llama3"):
--- a/src/axolotl/train.py
+++ b/src/axolotl/train.py
@@ -4,11 +4,14 @@ from __future__ import annotations
 import importlib
 import inspect
 import json
 import os
 import shutil
 import signal
 import sys
 import typing
 import weakref
 from collections import OrderedDict
 from contextlib import ExitStack
 from pathlib import Path
 from typing import Any, Dict
@@ -38,6 +41,7 @@ from axolotl.utils.distributed import cleanup_distributed
 from axolotl.utils.freeze import freeze_layers_except
 from axolotl.utils.logging import get_logger
 from axolotl.utils.schemas.enums import RLType
 from axolotl.utils.train import determine_last_checkpoint
 from axolotl.utils.trainer import setup_trainer
 try:
@@ -46,7 +50,7 @@ except ImportError:
    BetterTransformer = None
 if typing.TYPE_CHECKING:
-    from axolotl.core.trainer_builder import HFCausalTrainerBuilder, HFRLTrainerBuilder
+    from axolotl.core.builders import HFCausalTrainerBuilder, HFRLTrainerBuilder
 LOG = get_logger(__name__)
@@ -124,32 +128,6 @@ def setup_reference_model(
    return model_ref
 def determine_resume_checkpoint(cfg: DictDefault) -> str | None:
    """
    Determine the checkpoint to resume from based on configuration.
    Args:
        cfg: Dictionary mapping `axolotl` config keys to values.
    Returns:
        Path to the checkpoint to resume from, or `None` if not resuming.
    """
    if cfg.resume_from_checkpoint is None and cfg.auto_resume_from_checkpoints:
        possible_checkpoints = [
            str(cp) for cp in Path(cfg.output_dir).glob("checkpoint-*")
        ]
        if len(possible_checkpoints) > 0:
            sorted_paths = sorted(
                possible_checkpoints,
                key=lambda path: int(path.split("-")[-1]),
            )
            cfg.resume_from_checkpoint = sorted_paths[-1]
            LOG.info(
                f"Using Auto-resume functionality to start with checkpoint at {cfg.resume_from_checkpoint}"
            )
    return cfg.resume_from_checkpoint
 def setup_signal_handler(
    cfg: DictDefault, model: PreTrainedModel, safe_serialization: bool
 ):
@@ -275,19 +253,60 @@ def save_trained_model(
            # final model weights have already been saved by `ReLoRACallback.on_train_end`
            return
-    if trainer.is_fsdp_enabled or cfg.fsdp_config:
+    if (  # pylint: disable=too-many-nested-blocks
        trainer.is_fsdp_enabled or cfg.fsdp_config
    ):
        if cfg.fsdp_config or cfg.fsdp:
            if cfg.fsdp_config.final_state_dict_type:
                state_dict_type = cfg.fsdp_config.final_state_dict_type
            else:
                state_dict_type = cfg.fsdp_config.state_dict_type
            trainer.accelerator.state.fsdp_plugin.set_state_dict_type(state_dict_type)
-        trainer.save_model(cfg.output_dir)
+        trainer.save_model(cfg.output_dir)  # only handles FULL_STATE_DICT
        if state_dict_type == "SHARDED_STATE_DICT":
            LOG.info(
                "The final model was saved with a sharded state dict. Please ensure you merge "
                "the sharded weights with `merge-sharded-fsdp-weights`."
            )
            checkpoint_dir = determine_last_checkpoint(cfg, update=False)
            if (
                not (Path(cfg.output_dir) / "model.safetensors.index.json").exists()
                and checkpoint_dir
            ):
                # import here to prevent circular import
                from axolotl.cli.merge_sharded_fsdp_weights import merge_fsdp_weights
                fsdp_dir = Path(checkpoint_dir) / "pytorch_model_fsdp_0"
                merged_path = str(Path(cfg.output_dir) / "merged")
                merge_fsdp_weights(
                    checkpoint_dir=str(fsdp_dir),
                    output_path=merged_path,
                    safe_serialization=True,
                )
                trainer.accelerator.wait_for_everyone()
                if trainer.accelerator.is_main_process:
                    # move all files in merged_path to cfg.output_dir
                    for merged_file in Path(merged_path).iterdir():
                        if (Path(cfg.output_dir) / merged_file.name).exists():
                            (Path(cfg.output_dir) / merged_file.name).unlink()
                        shutil.move(str(merged_file), cfg.output_dir)
                    shutil.rmtree(merged_path)  # remove what should be an empty dir
        # TODO(wing):see https://github.com/huggingface/transformers/pull/40207
        # cleanup the FSDP prefix in the model config.json
        if trainer.accelerator.is_main_process:
            with open(
                Path(cfg.output_dir) / "config.json", "r", encoding="utf-8"
            ) as config_file_io:
                # read the model config as an OrderedDict
                config = json.load(config_file_io, object_pairs_hook=OrderedDict)
                config["architectures"] = [
                    name.lstrip("FSDP") for name in config["architectures"]
                ]
            # write the updated model config back
            with open(
                os.path.join(cfg.output_dir, "config.json"), "w", encoding="utf-8"
            ) as config_file_io:
                json.dump(config, config_file_io, indent=2)
    elif cfg.deepspeed and is_deepspeed_zero3_enabled():
        # Copied over from: https://github.com/huggingface/accelerate/blob/5ae611118057232f441055f7ef9ba0b0f2b8d533/docs/source/usage_guides/deepspeed.md#saving-and-loading
        trainer.accelerator.wait_for_everyone()
@@ -564,7 +583,7 @@ def train(
    setup_model_card(cfg)
    # Execute the training
-    resume_from_checkpoint = determine_resume_checkpoint(cfg)
+    resume_from_checkpoint = determine_last_checkpoint(cfg)
    execute_training(cfg, trainer, resume_from_checkpoint)
    # clear cache
--- a/src/axolotl/tui/README.md
+++ b/src/axolotl/tui/README.md
@@ -0,0 +1,216 @@
 # Axolotl TUI (Terminal User Interface)
 A comprehensive Terminal User Interface for Axolotl, providing an interactive way to manage configurations, training jobs, datasets, models, and system monitoring.
 ## Features
 ### 🏠 Main Dashboard
 - **Welcome Screen**: Central hub with quick access to all features
 - **Keyboard Navigation**: Efficient navigation with keyboard shortcuts
 - **Screen Management**: Easy switching between different functional areas
 ### 📝 Configuration Management
 - **YAML Editor**: Syntax-highlighted editor for Axolotl configurations
 - **Real-time Validation**: Instant config validation with detailed error reporting
 - **File Browser**: Navigate and select configuration files
 - **Template Loading**: Load example configurations
 - **Remote Config Support**: Load configurations from URLs
 **Key Shortcuts:**
 - `Ctrl+N`: New configuration
 - `Ctrl+S`: Save configuration
 - `Ctrl+V`: Validate configuration
 - `Ctrl+E`: Toggle edit mode
 ### 🚀 Training Management
 - **Job Launcher**: Start training with different launchers (accelerate, torchrun)
 - **Real-time Monitoring**: Live training progress and metrics
 - **Loss Visualization**: Sparkline charts for loss curves
 - **Job Control**: Start, stop, resume, and manage multiple training jobs
 - **Log Streaming**: Real-time log viewing and filtering
 **Key Shortcuts:**
 - `Ctrl+T`: New training job
 - `Ctrl+R`: Resume training
 - `Ctrl+X`: Stop training
 - `R`: Refresh status
 ### 📊 Dataset Management
 - **Dataset Browser**: Explore local and remote datasets
 - **Preview & Statistics**: View dataset samples and metadata
 - **Preprocessing**: Run dataset preprocessing with progress tracking
 - **HuggingFace Integration**: Download and manage HF datasets
 - **Format Detection**: Automatic dataset format recognition
 **Key Shortcuts:**
 - `Ctrl+P`: Preprocess dataset
 - `Ctrl+V`: Preview dataset
 - `Ctrl+I`: Dataset information
 - `R`: Refresh dataset list
 ### 🤖 Model Management
 - **Model Discovery**: Automatically find trained models
 - **LoRA Operations**: Merge LoRA adapters with base models
 - **Quantization**: Quantize models for deployment
 - **Evaluation**: Run model evaluation benchmarks
 - **Storage Info**: View model sizes and storage details
 **Key Shortcuts:**
 - `Ctrl+M`: Merge LoRA
 - `Ctrl+Q`: Quantize model
 - `Ctrl+E`: Evaluate model
 - `R`: Refresh model list
 ### 💬 Inference & Testing
 - **Interactive Chat**: Chat interface for model testing
 - **Parameter Tuning**: Adjust inference parameters (temperature, top-p, max tokens)
 - **Model Loading**: Load and switch between different models
 - **Chat History**: Save and load conversation history
 - **Gradio Integration**: Launch Gradio web interface
 **Key Shortcuts:**
 - `Ctrl+Enter`: Send message
 - `Ctrl+C`: Clear chat
 - `Ctrl+L`: Load model
 - `Ctrl+S`: Save chat
 ### 📈 System Monitoring
 - **Resource Monitoring**: Real-time CPU, GPU, and memory usage
 - **Process Management**: View and manage running processes
 - **Performance Graphs**: Historical usage charts with sparklines
 - **GPU Information**: Detailed GPU status and memory usage
 - **Temperature Monitoring**: System temperature tracking
 **Key Shortcuts:**
 - `R`: Refresh metrics
 - `Ctrl+K`: Kill selected process
 ## Installation
 ### Dependencies
 ```bash
 pip install textual==1.0.0 rich==14.1.0
 ```
 ### Launch TUI
 ```bash
 # From command line
 python -m axolotl.cli.main tui
 # From Python code
 from axolotl.tui.app import run
 run()
 ```
 ## Architecture
 ### Screen Structure
 ```
 AxolotlTUI (Main App)
 ├── WelcomeScreen (Dashboard)
 ├── ConfigScreen (Configuration Management)
 ├── TrainingScreen (Training Management)
 ├── DatasetScreen (Dataset Management)
 ├── ModelScreen (Model Management)
 ├── InferenceScreen (Inference & Testing)
 └── MonitorScreen (System Monitoring)
 ```
 ### Key Components
 - **BaseScreen**: Common functionality for all screens
 - **Screen Navigation**: Stack-based screen management
 - **Event Handling**: Reactive UI updates
 - **Background Tasks**: Non-blocking operations
 - **State Management**: Shared application state
 ### Integration Points
 - **CLI Commands**: Seamless integration with existing axolotl CLI
 - **Configuration System**: Uses axolotl's native config loading
 - **Training Pipeline**: Integrates with axolotl training functions
 - **Model Loading**: Compatible with axolotl model management
 ## Usage Examples
 ### 1. Creating a New Configuration
 1. Launch TUI: `python -m axolotl.cli.main tui`
 2. Select "Configuration Management" or press `C`
 3. Press `Ctrl+N` for new configuration
 4. Edit the template configuration
 5. Press `Ctrl+V` to validate
 6. Press `Ctrl+S` to save
 ### 2. Starting a Training Job
 1. Navigate to "Training Management" or press `T`
 2. Press `Ctrl+T` for new training job
 3. Select configuration file and launcher
 4. Monitor progress in real-time
 5. View loss curves and logs
 ### 3. Interactive Model Testing
 1. Go to "Inference & Testing" or press `I`
 2. Load a trained model with `Ctrl+L`
 3. Adjust inference parameters as needed
 4. Start chatting with the model
 5. Save conversation with `Ctrl+S`
 ## Navigation
 ### Global Shortcuts
 - `Ctrl+Q`: Quit application
 - `Escape`: Go back/close current screen
 - `Tab`: Navigate between UI elements
 - `Enter`: Select/activate element
 - `Space`: Toggle switches/checkboxes
 ### Screen Shortcuts
 Each screen has specific shortcuts displayed in the footer. Common patterns:
 - `Ctrl+[Letter]`: Primary actions
 - `R`: Refresh/reload
 - `F1-F12`: Function keys for advanced features
 ## Customization
 ### Themes
 The TUI uses Textual's theming system and can be customized by modifying the CSS in each screen class.
 ### Adding New Screens
 1. Create a new screen class inheriting from `BaseScreen`
 2. Implement the `compose()` method for UI layout
 3. Add event handlers for user interactions
 4. Register the screen in the main app navigation
 ### Extending Functionality
 - Add new widgets to existing screens
 - Implement custom data visualization
 - Integrate with external tools and APIs
 - Add new keyboard shortcuts
 ## Troubleshooting
 ### Common Issues
 1. **Import Errors**: Ensure textual and rich are installed
 2. **Permission Errors**: Check file system permissions for config directories
 3. **GPU Monitoring**: Install pynvml for GPU monitoring features
 4. **Config Validation**: Ensure axolotl dependencies are properly installed
 ### Debug Mode
 Launch with debug logging:
 ```bash
 TEXTUAL_LOG=DEBUG python -m axolotl.cli.main tui
 ```
 ### Performance
 - Use `Ctrl+\` to open Textual's debug console
 - Monitor memory usage with the system monitor
 - Disable auto-refresh for better performance on slower systems
 ## Contributing
 The TUI is designed to be extensible. Contributions are welcome for:
 - New screen implementations
 - Enhanced visualizations
 - Better keyboard navigation
 - Additional integrations
 - Performance improvements
 See the main Axolotl repository for contribution guidelines.
--- a/src/axolotl/tui/init.py
+++ b/src/axolotl/tui/init.py
@@ -0,0 +1 @@
 """Axolotl Terminal User Interface (TUI)."""
--- a/src/axolotl/tui/app.py
+++ b/src/axolotl/tui/app.py
@@ -0,0 +1,180 @@
 """Main TUI application for Axolotl."""
 from textual import on
 from textual.app import App, ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.screen import Screen
 from textual.widgets import Button, Footer, Header, Static
 from axolotl.tui.screens.config import ConfigScreen
 from axolotl.tui.screens.datasets import DatasetScreen
 from axolotl.tui.screens.inference import InferenceScreen
 from axolotl.tui.screens.models import ModelScreen
 from axolotl.tui.screens.monitor import MonitorScreen
 from axolotl.tui.screens.training import TrainingScreen
 class WelcomeScreen(Screen):
    """Welcome screen with main menu."""
    BINDINGS = [
        Binding("q", "quit", "Quit"),
        Binding("c", "config", "Configuration"),
        Binding("t", "training", "Training"),
        Binding("d", "datasets", "Datasets"),
        Binding("m", "models", "Models"),
        Binding("i", "inference", "Inference"),
        Binding("s", "monitor", "System Monitor"),
    ]
    def compose(self) -> ComposeResult:
        """Compose the welcome screen."""
        yield Header()
        yield Container(
            Static("🦾 Axolotl TUI", classes="title"),
            Static(
                "A Terminal User Interface for fine-tuning LLMs", classes="subtitle"
            ),
            Container(
                Button("Configuration Management [C]", id="config", variant="primary"),
                Button("Training Management [T]", id="training", variant="primary"),
                Button("Dataset Management [D]", id="datasets", variant="primary"),
                Button("Model Management [M]", id="models", variant="primary"),
                Button("Inference & Testing [I]", id="inference", variant="primary"),
                Button("System Monitor [S]", id="monitor", variant="primary"),
                classes="menu-container",
            ),
            classes="welcome-container",
        )
        yield Footer()
    def action_quit(self) -> None:
        """Quit the application."""
        self.app.exit()
    def action_config(self) -> None:
        """Navigate to config screen."""
        self.app.push_screen(ConfigScreen())
    def action_training(self) -> None:
        """Navigate to training screen."""
        self.app.push_screen(TrainingScreen())
    def action_datasets(self) -> None:
        """Navigate to datasets screen."""
        self.app.push_screen(DatasetScreen())
    def action_models(self) -> None:
        """Navigate to models screen."""
        self.app.push_screen(ModelScreen())
    def action_inference(self) -> None:
        """Navigate to inference screen."""
        self.app.push_screen(InferenceScreen())
    def action_monitor(self) -> None:
        """Navigate to monitor screen."""
        self.app.push_screen(MonitorScreen())
    @on(Button.Pressed, "#config")
    def on_config_pressed(self) -> None:
        """Handle config button press."""
        self.action_config()
    @on(Button.Pressed, "#training")
    def on_training_pressed(self) -> None:
        """Handle training button press."""
        self.action_training()
    @on(Button.Pressed, "#datasets")
    def on_datasets_pressed(self) -> None:
        """Handle datasets button press."""
        self.action_datasets()
    @on(Button.Pressed, "#models")
    def on_models_pressed(self) -> None:
        """Handle models button press."""
        self.action_models()
    @on(Button.Pressed, "#inference")
    def on_inference_pressed(self) -> None:
        """Handle inference button press."""
        self.action_inference()
    @on(Button.Pressed, "#monitor")
    def on_monitor_pressed(self) -> None:
        """Handle monitor button press."""
        self.action_monitor()
 class AxolotlTUI(App):
    """Main Axolotl TUI Application."""
    CSS = """
    .title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .subtitle {
        text-align: center;
        padding: 1;
        color: $text-muted;
    }
    .welcome-container {
        align: center middle;
        height: 100%;
        width: 100%;
    }
    .menu-container {
        layout: vertical;
        align: center middle;
        padding: 2;
        width: auto;
        height: auto;
    }
    .menu-container Button {
        width: 35;
        margin: 1;
    }
    WelcomeScreen {
        align: center middle;
    }
    """
    BINDINGS = [
        Binding("ctrl+q", "quit", "Quit", priority=True),
        Binding("escape", "back", "Back", priority=True),
    ]
    def on_mount(self) -> None:
        """Called when the app is mounted."""
        self.title = "Axolotl TUI"
        self.sub_title = "Fine-tuning LLMs made easy"
        self.push_screen(WelcomeScreen())
    def action_quit(self) -> None:
        """Quit the application."""
        self.exit()
    def action_back(self) -> None:
        """Go back to previous screen."""
        if len(self.screen_stack) > 1:
            self.pop_screen()
 def run():
    """Run the Axolotl TUI application."""
    app = AxolotlTUI()
    app.run()
 if __name__ == "__main__":
    run()
--- a/src/axolotl/tui/dialogs/init.py
+++ b/src/axolotl/tui/dialogs/init.py
@@ -0,0 +1 @@
 """TUI dialogs for Axolotl."""
--- a/src/axolotl/tui/dialogs/training.py
+++ b/src/axolotl/tui/dialogs/training.py
@@ -0,0 +1,112 @@
 """Training dialogs for Axolotl TUI."""
 from pathlib import Path
 from textual import on
 from textual.app import ComposeResult
 from textual.containers import Container
 from textual.screen import ModalScreen
 from textual.widgets import Button, Input, Label, Select, Static
 class NewTrainingDialog(ModalScreen):
    """Dialog for starting a new training job."""
    CSS = """
    NewTrainingDialog {
        align: center middle;
    }
    .dialog-container {
        background: $surface;
        border: thick $primary;
        padding: 2;
        width: 60;
        height: auto;
    }
    .dialog-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .form-field {
        margin: 1 0;
    }
    .form-label {
        margin: 0 0 1 0;
        color: $text-muted;
    }
    .button-container {
        layout: horizontal;
        align: center middle;
        margin: 2 0 0 0;
    }
    .button-container Button {
        margin: 0 1;
    }
    """
    def compose(self) -> ComposeResult:
        """Compose the dialog."""
        yield Container(
            Static("Start New Training Job", classes="dialog-title"),
            Container(
                Label("Configuration File:", classes="form-label"),
                Input(
                    placeholder="Path to config YAML file",
                    id="config-path",
                    value="/workspace/configs/",
                ),
                classes="form-field",
            ),
            Container(
                Label("Launcher:", classes="form-label"),
                Select(
                    [
                        ("accelerate", "Accelerate (Recommended)"),
                        ("torchrun", "TorchRun"),
                        ("deepspeed", "DeepSpeed"),
                    ],
                    id="launcher",
                    value="accelerate",
                ),
                classes="form-field",
            ),
            Container(
                Button("Start Training", variant="primary", id="start"),
                Button("Cancel", variant="default", id="cancel"),
                classes="button-container",
            ),
            classes="dialog-container",
        )
    @on(Button.Pressed, "#start")
    def handle_start(self) -> None:
        """Handle start button press."""
        config_input = self.query_one("#config-path", Input)
        launcher_select = self.query_one("#launcher", Select)
        config_path = config_input.value.strip()
        if not config_path:
            return
        if not Path(config_path).exists():
            return
        result = {
            "config_path": config_path,
            "launcher": launcher_select.value,
        }
        self.dismiss(result)
    @on(Button.Pressed, "#cancel")
    def handle_cancel(self) -> None:
        """Handle cancel button press."""
        self.dismiss(None)
--- a/src/axolotl/tui/screens/init.py
+++ b/src/axolotl/tui/screens/init.py
@@ -0,0 +1 @@
 """TUI screens for Axolotl."""
--- a/src/axolotl/tui/screens/base.py
+++ b/src/axolotl/tui/screens/base.py
@@ -0,0 +1,50 @@
 """Base screen class for Axolotl TUI screens."""
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.screen import Screen
 from textual.widgets import Footer, Header, Static
 class BaseScreen(Screen):
    """Base class for all Axolotl TUI screens."""
    BINDINGS = [
        Binding("escape", "back", "Back"),
        Binding("q", "quit", "Quit"),
    ]
    def __init__(self, title: str = "Axolotl", subtitle: str = ""):
        """Initialize the base screen.
        Args:
            title: The screen title
            subtitle: Optional subtitle for the screen
        """
        super().__init__()
        self.screen_title = title
        self.screen_subtitle = subtitle
    def compose(self) -> ComposeResult:
        """Compose the base screen layout."""
        yield Header()
        yield Container(
            Static(f"🦾 {self.screen_title}", classes="screen-title"),
            (
                Static(self.screen_subtitle, classes="screen-subtitle")
                if self.screen_subtitle
                else Static("")
            ),
            Container(id="content"),
            id="main-container",
        )
        yield Footer()
    def action_back(self) -> None:
        """Go back to previous screen."""
        self.app.pop_screen()
    def action_quit(self) -> None:
        """Quit the application."""
        self.app.exit()
--- a/src/axolotl/tui/screens/config.py
+++ b/src/axolotl/tui/screens/config.py
@@ -0,0 +1,376 @@
 """Configuration management screen for Axolotl TUI."""
 import os
 from pathlib import Path
 from typing import Optional
 import yaml
 from textual import on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.reactive import reactive
 from textual.widgets import (
    Button,
    DirectoryTree,
    Footer,
    Header,
    Label,
    Log,
    Static,
    TextArea,
 )
 from axolotl.tui.screens.base import BaseScreen
 class ConfigScreen(BaseScreen):
    """Configuration management screen."""
    BINDINGS = [
        Binding("ctrl+n", "new_config", "New Config"),
        Binding("ctrl+o", "open_config", "Open Config"),
        Binding("ctrl+s", "save_config", "Save Config"),
        Binding("ctrl+v", "validate_config", "Validate"),
        Binding("ctrl+e", "edit_mode", "Toggle Edit Mode"),
    ]
    CSS = """
    .config-container {
        layout: horizontal;
        height: 100%;
    }
    .file-browser {
        width: 30%;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .config-editor {
        width: 70%;
        border: solid $secondary;
        padding: 1;
        margin: 1;
    }
    .config-form {
        height: 80%;
    }
    .config-actions {
        layout: horizontal;
        height: 3;
        align: center middle;
        padding: 1;
    }
    .config-actions Button {
        margin: 0 1;
    }
    TextArea {
        height: 100%;
    }
    .validation-log {
        height: 20%;
        border: solid $warning;
        padding: 1;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    """
    def __init__(self):
        """Initialize the config screen."""
        super().__init__(
            title="Configuration Management",
            subtitle="Create, edit, and validate Axolotl configurations",
        )
        self.current_config_path: Optional[Path] = None
        self.edit_mode = reactive(False)
        self.config_data = {}
    def compose(self) -> ComposeResult:
        """Compose the config screen layout."""
        yield Header()
        yield Container(
            Static("🦾 Configuration Management", classes="screen-title"),
            Static(
                "Create, edit, and validate Axolotl configurations",
                classes="screen-subtitle",
            ),
            Container(
                Container(
                    Label("Config Files"),
                    DirectoryTree(
                        (
                            Path("/workspace/configs")
                            if Path("/workspace/configs").exists()
                            else Path.cwd()
                        ),
                        id="config-tree",
                    ),
                    classes="file-browser",
                ),
                Container(
                    Container(
                        TextArea(
                            "",
                            language="yaml",
                            theme="monokai",
                            id="config-editor",
                            read_only=True,
                        ),
                        classes="config-form",
                    ),
                    Container(
                        Button("New", id="new-config", variant="primary"),
                        Button("Open", id="open-config", variant="primary"),
                        Button("Save", id="save-config", variant="success"),
                        Button("Validate", id="validate-config", variant="warning"),
                        Button("Edit Mode", id="toggle-edit", variant="default"),
                        Button("Load Example", id="load-example", variant="default"),
                        classes="config-actions",
                    ),
                    Container(
                        Log(id="validation-log"),
                        classes="validation-log",
                    ),
                    classes="config-editor",
                ),
                classes="config-container",
            ),
            id="content",
        )
        yield Footer()
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        tree = self.query_one("#config-tree", DirectoryTree)
        tree.show_root = False
        tree.guide_depth = 3
        log = self.query_one("#validation-log", Log)
        log.write_line("Ready to load configuration files...")
    @on(DirectoryTree.FileSelected)
    def handle_file_selected(self, event: DirectoryTree.FileSelected) -> None:
        """Handle file selection from the directory tree."""
        if event.path.suffix in [".yaml", ".yml"]:
            self.load_config_file(event.path)
    def load_config_file(self, path: Path) -> None:
        """Load a configuration file."""
        self.current_config_path = path
        try:
            with open(path, "r") as f:
                content = f.read()
                self.config_data = yaml.safe_load(content)
            editor = self.query_one("#config-editor", TextArea)
            editor.load_text(content)
            log = self.query_one("#validation-log", Log)
            log.clear()
            log.write_line(f"✅ Loaded: {path.name}")
        except Exception as e:
            log = self.query_one("#validation-log", Log)
            log.write_line(f"❌ Error loading {path.name}: {str(e)}")
    @on(Button.Pressed, "#new-config")
    def handle_new_config(self) -> None:
        """Create a new configuration."""
        template = """# Axolotl Configuration
 base_model:
 model_type:
 tokenizer_type:
 # Dataset Configuration
 datasets:
  - path:
    type:
 # Training Configuration
 output_dir: ./outputs
 num_epochs: 3
 micro_batch_size: 1
 gradient_accumulation_steps: 4
 learning_rate: 0.00002
 warmup_steps: 100
 eval_steps: 100
 save_steps: 500
 # LoRA Configuration (optional)
 adapter: lora
 lora_r: 8
 lora_alpha: 16
 lora_dropout: 0.05
 lora_target_modules:
 # Training optimizations
 gradient_checkpointing: true
 flash_attention: true
 bf16: auto
 tf32: true
 # Logging
 logging_steps: 10
 wandb_project:
 wandb_entity:
 """
        editor = self.query_one("#config-editor", TextArea)
        editor.load_text(template)
        editor.read_only = False
        self.edit_mode = True
        self.update_edit_button()
        log = self.query_one("#validation-log", Log)
        log.clear()
        log.write_line("📝 New configuration created. Edit and save when ready.")
    @on(Button.Pressed, "#save-config")
    def handle_save_config(self) -> None:
        """Save the current configuration."""
        editor = self.query_one("#config-editor", TextArea)
        content = editor.text
        if not content.strip():
            log = self.query_one("#validation-log", Log)
            log.write_line("⚠️ Cannot save empty configuration")
            return
        if not self.current_config_path:
            default_path = Path("/workspace/configs/new_config.yaml")
            default_path.parent.mkdir(parents=True, exist_ok=True)
            self.current_config_path = default_path
        try:
            with open(self.current_config_path, "w") as f:
                f.write(content)
            log = self.query_one("#validation-log", Log)
            log.write_line(f"💾 Saved: {self.current_config_path.name}")
        except Exception as e:
            log = self.query_one("#validation-log", Log)
            log.write_line(f"❌ Error saving: {str(e)}")
    @on(Button.Pressed, "#validate-config")
    @work(thread=True)
    async def handle_validate_config(self) -> None:
        """Validate the current configuration."""
        editor = self.query_one("#config-editor", TextArea)
        content = editor.text
        if not content.strip():
            log = self.query_one("#validation-log", Log)
            log.write_line("⚠️ No configuration to validate")
            return
        log = self.query_one("#validation-log", Log)
        log.clear()
        log.write_line("🔍 Validating configuration...")
        try:
            import tempfile
            with tempfile.NamedTemporaryFile(
                mode="w", suffix=".yaml", delete=False
            ) as f:
                f.write(content)
                temp_path = f.name
            from argparse import Namespace
            from axolotl.cli.config import check_user_config
            args = Namespace(
                config=temp_path,
                debug=False,
                debug_text_only=False,
                debug_num_examples=5,
                accelerate_config=None,
                multi_gpu=False,
            )
            check_user_config(args)
            log.write_line("✅ Configuration is valid!")
            os.unlink(temp_path)
        except Exception as e:
            log.write_line(f"❌ Validation failed: {str(e)}")
            if "temp_path" in locals():
                os.unlink(temp_path)
    @on(Button.Pressed, "#toggle-edit")
    def handle_toggle_edit(self) -> None:
        """Toggle edit mode for the configuration."""
        editor = self.query_one("#config-editor", TextArea)
        self.edit_mode = not self.edit_mode
        editor.read_only = not self.edit_mode
        self.update_edit_button()
        log = self.query_one("#validation-log", Log)
        if self.edit_mode:
            log.write_line("✏️ Edit mode enabled")
        else:
            log.write_line("👁️ View mode enabled")
    @on(Button.Pressed, "#load-example")
    async def handle_load_example(self) -> None:
        """Load an example configuration."""
        examples_dir = Path("/workspace/axolotl/examples")
        if not examples_dir.exists():
            log = self.query_one("#validation-log", Log)
            log.write_line("⚠️ Examples directory not found")
            return
        yaml_files = list(examples_dir.glob("**/*.yml")) + list(
            examples_dir.glob("**/*.yaml")
        )
        if yaml_files:
            self.load_config_file(yaml_files[0])
            log = self.query_one("#validation-log", Log)
            log.write_line(f"📚 Loaded example: {yaml_files[0].name}")
    def update_edit_button(self) -> None:
        """Update the edit button appearance."""
        button = self.query_one("#toggle-edit", Button)
        if self.edit_mode:
            button.variant = "warning"
            button.label = "Edit Mode: ON"
        else:
            button.variant = "default"
            button.label = "Edit Mode: OFF"
    def action_new_config(self) -> None:
        """Create a new configuration."""
        self.handle_new_config()
    def action_save_config(self) -> None:
        """Save the current configuration."""
        self.handle_save_config()
    def action_validate_config(self) -> None:
        """Validate the current configuration."""
        self.handle_validate_config()
    def action_edit_mode(self) -> None:
        """Toggle edit mode."""
        self.handle_toggle_edit()
--- a/src/axolotl/tui/screens/datasets.py
+++ b/src/axolotl/tui/screens/datasets.py
@@ -0,0 +1,440 @@
 """Dataset management screen for Axolotl TUI."""
 import json
 from pathlib import Path
 from typing import Dict, Optional
 from textual import on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.widgets import (
    Button,
    DataTable,
    Footer,
    Header,
    Label,
    Log,
    ProgressBar,
    Static,
    TextArea,
 )
 from axolotl.tui.screens.base import BaseScreen
 class DatasetScreen(BaseScreen):
    """Dataset management screen."""
    BINDINGS = [
        Binding("ctrl+p", "preprocess", "Preprocess"),
        Binding("ctrl+v", "preview", "Preview"),
        Binding("ctrl+i", "info", "Info"),
        Binding("r", "refresh", "Refresh"),
    ]
    CSS = """
    .dataset-container {
        layout: horizontal;
        height: 100%;
    }
    .dataset-list {
        width: 40%;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .dataset-details {
        width: 60%;
        border: solid $secondary;
        padding: 1;
        margin: 1;
    }
    .dataset-actions {
        layout: horizontal;
        height: 4;
        align: center middle;
        padding: 1;
    }
    .dataset-actions Button {
        margin: 0 1;
    }
    DataTable {
        height: 100%;
    }
    .preview-container {
        height: 100%;
        border: solid $primary;
        padding: 1;
    }
    TextArea {
        height: 100%;
    }
    .stats-container {
        layout: vertical;
        padding: 1;
    }
    .stat-row {
        layout: horizontal;
        padding: 0 0 1 0;
    }
    .stat-label {
        width: 50%;
        color: $text-muted;
    }
    .stat-value {
        width: 50%;
        text-align: right;
        text-style: bold;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    .progress-container {
        padding: 1;
        border: solid $warning;
        margin: 1;
    }
    """
    def __init__(self):
        """Initialize the dataset screen."""
        super().__init__(
            title="Dataset Management",
            subtitle="Browse, preview, and preprocess datasets",
        )
        self.datasets: Dict[str, Dict] = {}
        self.selected_dataset: Optional[str] = None
        self.preprocessing_active = False
    def compose(self) -> ComposeResult:
        """Compose the dataset screen layout."""
        yield Header()
        yield Container(
            Static("🦾 Dataset Management", classes="screen-title"),
            Static(
                "Browse, preview, and preprocess datasets", classes="screen-subtitle"
            ),
            Container(
                Container(
                    Label("Available Datasets"),
                    DataTable(id="dataset-table"),
                    Container(
                        Button("Load Dataset", id="load-dataset", variant="primary"),
                        Button("Preprocess", id="preprocess", variant="success"),
                        Button("Download", id="download", variant="default"),
                        Button("Refresh", id="refresh", variant="default"),
                        classes="dataset-actions",
                    ),
                    classes="dataset-list",
                ),
                Container(
                    TextArea("", id="dataset-preview", read_only=True),
                    Container(
                        Static("Dataset Name:", classes="stat-label"),
                        Static("-", id="stat-name", classes="stat-value"),
                        Static("Type:", classes="stat-label"),
                        Static("-", id="stat-type", classes="stat-value"),
                        Static("Size:", classes="stat-label"),
                        Static("-", id="stat-size", classes="stat-value"),
                        Static("Samples:", classes="stat-label"),
                        Static("-", id="stat-samples", classes="stat-value"),
                        Static("Features:", classes="stat-label"),
                        Static("-", id="stat-features", classes="stat-value"),
                        Static("Format:", classes="stat-label"),
                        Static("-", id="stat-format", classes="stat-value"),
                        Static("Preprocessed:", classes="stat-label"),
                        Static("-", id="stat-preprocessed", classes="stat-value"),
                    ),
                    Log(id="processing-log"),
                    ProgressBar(total=100, id="preprocessing-progress"),
                    classes="dataset-details",
                ),
                classes="dataset-container",
            ),
            id="content",
        )
        yield Footer()
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        self.setup_dataset_table()
        self.load_datasets()
        log = self.query_one("#processing-log", Log)
        log.write_line("Dataset manager ready.")
    def setup_dataset_table(self) -> None:
        """Setup the dataset table."""
        table = self.query_one("#dataset-table", DataTable)
        table.add_columns("Name", "Type", "Size", "Status")
        table.cursor_type = "row"
        table.zebra_stripes = True
    @work(thread=True)
    async def load_datasets(self) -> None:
        """Load available datasets."""
        # Check for local datasets
        datasets_dir = Path("/workspace/datasets")
        if datasets_dir.exists():
            for dataset_path in datasets_dir.glob("*"):
                if dataset_path.is_dir():
                    self.datasets[dataset_path.name] = {
                        "name": dataset_path.name,
                        "path": str(dataset_path),
                        "type": "local",
                        "size": self.get_dir_size(dataset_path),
                        "status": "available",
                    }
        # Check for HuggingFace datasets in configs
        configs_dir = Path("/workspace/configs")
        if configs_dir.exists():
            for config_file in configs_dir.glob("*.yaml"):
                try:
                    import yaml
                    with open(config_file) as f:
                        config = yaml.safe_load(f)
                        if "datasets" in config:
                            for ds in config.get("datasets", []):
                                if "path" in ds:
                                    ds_name = ds["path"].split("/")[-1]
                                    self.datasets[ds_name] = {
                                        "name": ds_name,
                                        "path": ds["path"],
                                        "type": ds.get("type", "huggingface"),
                                        "size": "Unknown",
                                        "status": "remote",
                                    }
                except Exception:
                    pass
        self.refresh_dataset_table()
    def get_dir_size(self, path: Path) -> str:
        """Get human-readable directory size."""
        total_size = sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
        for unit in ["B", "KB", "MB", "GB"]:
            if total_size < 1024.0:
                return f"{total_size:.2f} {unit}"
            total_size /= 1024.0
        return f"{total_size:.2f} TB"
    def refresh_dataset_table(self) -> None:
        """Refresh the dataset table."""
        table = self.query_one("#dataset-table", DataTable)
        table.clear()
        for name, info in self.datasets.items():
            table.add_row(
                name[:30],
                info["type"],
                info["size"],
                info["status"],
            )
    @on(DataTable.RowSelected)
    def handle_dataset_selected(self, event: DataTable.RowSelected) -> None:
        """Handle dataset selection from table."""
        if event.cursor_row >= 0:
            dataset_names = list(self.datasets.keys())
            if event.cursor_row < len(dataset_names):
                self.selected_dataset = dataset_names[event.cursor_row]
                self.load_dataset_preview()
                self.update_dataset_stats()
    @work(thread=True)
    async def load_dataset_preview(self) -> None:
        """Load preview of selected dataset."""
        if not self.selected_dataset:
            return
        dataset_info = self.datasets[self.selected_dataset]
        preview_text = ""
        try:
            if dataset_info["type"] == "local" and Path(dataset_info["path"]).exists():
                # Load first few samples from local dataset
                sample_files = list(Path(dataset_info["path"]).glob("*.json"))[:3]
                samples = []
                for sample_file in sample_files:
                    with open(sample_file) as f:
                        samples.append(json.load(f))
                preview_text = json.dumps(samples, indent=2)
            else:
                # Show dataset info for remote datasets
                preview_text = json.dumps(dataset_info, indent=2)
        except Exception as e:
            preview_text = f"Error loading preview: {str(e)}"
        preview = self.query_one("#dataset-preview", TextArea)
        preview.load_text(preview_text)
    def update_dataset_stats(self) -> None:
        """Update dataset statistics display."""
        if not self.selected_dataset:
            return
        info = self.datasets[self.selected_dataset]
        self.query_one("#stat-name", Static).update(info["name"])
        self.query_one("#stat-type", Static).update(info["type"])
        self.query_one("#stat-size", Static).update(info["size"])
        self.query_one("#stat-samples", Static).update("N/A")
        self.query_one("#stat-features", Static).update("N/A")
        self.query_one("#stat-format", Static).update("JSON")
        self.query_one("#stat-preprocessed", Static).update("No")
    @on(Button.Pressed, "#preprocess")
    @work(thread=True)
    async def handle_preprocess(self) -> None:
        """Preprocess selected dataset."""
        if not self.selected_dataset or self.preprocessing_active:
            return
        self.preprocessing_active = True
        dataset_info = self.datasets[self.selected_dataset]
        log = self.query_one("#processing-log", Log)
        log.clear()
        log.write_line(f"🔄 Starting preprocessing for {self.selected_dataset}...")
        progress = self.query_one("#preprocessing-progress", ProgressBar)
        progress.update(progress=0)
        try:
            import subprocess
            import tempfile
            # Create a temporary config for preprocessing
            with tempfile.NamedTemporaryFile(
                mode="w", suffix=".yaml", delete=False
            ) as f:
                config = {
                    "datasets": [
                        {
                            "path": dataset_info["path"],
                            "type": dataset_info.get("type", "alpaca"),
                        }
                    ],
                    "output_dir": f"/tmp/preprocessed_{self.selected_dataset}",
                }
                import yaml
                yaml.dump(config, f)
                temp_config = f.name
            # Run preprocessing
            cmd = ["python", "-m", "axolotl.cli.preprocess", temp_config]
            process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
            )
            # Monitor progress
            for line in process.stdout:
                log.write_line(line.strip())
                # Update progress bar based on output
                if "Processing" in line:
                    progress.advance(10)
            process.wait()
            if process.returncode == 0:
                log.write_line("✅ Preprocessing completed successfully!")
                dataset_info["status"] = "preprocessed"
                progress.update(progress=100)
            else:
                log.write_line(
                    f"❌ Preprocessing failed with code {process.returncode}"
                )
            import os
            os.unlink(temp_config)
        except Exception as e:
            log.write_line(f"❌ Error during preprocessing: {str(e)}")
        finally:
            self.preprocessing_active = False
            self.refresh_dataset_table()
    @on(Button.Pressed, "#load-dataset")
    async def handle_load_dataset(self) -> None:
        """Load a new dataset."""
        log = self.query_one("#processing-log", Log)
        log.write_line("📦 Load dataset functionality coming soon...")
    @on(Button.Pressed, "#download")
    @work(thread=True)
    async def handle_download(self) -> None:
        """Download a remote dataset."""
        if not self.selected_dataset:
            return
        dataset_info = self.datasets[self.selected_dataset]
        if dataset_info["type"] != "huggingface":
            return
        log = self.query_one("#processing-log", Log)
        log.clear()
        log.write_line(f"📥 Downloading {self.selected_dataset} from HuggingFace...")
        try:
            from datasets import load_dataset
            dataset = load_dataset(dataset_info["path"])
            save_path = Path(f"/workspace/datasets/{self.selected_dataset}")
            save_path.mkdir(parents=True, exist_ok=True)
            dataset.save_to_disk(str(save_path))
            log.write_line(f"✅ Downloaded to {save_path}")
            dataset_info["type"] = "local"
            dataset_info["status"] = "available"
            dataset_info["path"] = str(save_path)
            self.refresh_dataset_table()
        except Exception as e:
            log.write_line(f"❌ Download failed: {str(e)}")
    @on(Button.Pressed, "#refresh")
    def handle_refresh(self) -> None:
        """Refresh dataset list."""
        self.load_datasets()
    def action_preprocess(self) -> None:
        """Preprocess selected dataset."""
        self.handle_preprocess()
    def action_refresh(self) -> None:
        """Refresh dataset list."""
        self.handle_refresh()
--- a/src/axolotl/tui/screens/inference.py
+++ b/src/axolotl/tui/screens/inference.py
@@ -0,0 +1,445 @@
 """Inference and testing screen for Axolotl TUI."""
 from pathlib import Path
 from typing import Dict, List, Optional
 from textual import events, on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.widgets import (
    Button,
    Input,
    Label,
    Log,
    Select,
    Static,
    TextArea,
 )
 from axolotl.tui.screens.base import BaseScreen
 class InferenceScreen(BaseScreen):
    """Inference and testing screen."""
    BINDINGS = [
        Binding("ctrl+enter", "send_message", "Send"),
        Binding("ctrl+c", "clear_chat", "Clear"),
        Binding("ctrl+l", "load_model", "Load Model"),
        Binding("ctrl+s", "save_chat", "Save Chat"),
    ]
    CSS = """
    .inference-container {
        layout: horizontal;
        height: 100%;
    }
    .model-selector {
        width: 30%;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .chat-interface {
        width: 70%;
        border: solid $secondary;
        padding: 1;
        margin: 1;
    }
    .chat-history {
        height: 70%;
        border: solid $primary;
        padding: 1;
        margin: 0 0 1 0;
    }
    .input-area {
        height: 20%;
        border: solid $warning;
        padding: 1;
        margin: 0 0 1 0;
    }
    .chat-controls {
        layout: horizontal;
        height: 4;
        align: center middle;
        padding: 1;
    }
    .chat-controls Button {
        margin: 0 1;
    }
    .model-info {
        padding: 1;
        border: solid $surface;
        margin: 1 0;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    TextArea {
        height: 100%;
    }
    Log {
        height: 100%;
    }
    """
    def __init__(self):
        """Initialize the inference screen."""
        super().__init__(
            title="Inference & Testing", subtitle="Interactive chat and model testing"
        )
        self.loaded_model: Optional[str] = None
        self.chat_history: List[Dict[str, str]] = []
    def compose(self) -> ComposeResult:
        """Compose the inference screen layout."""
        yield Container(
            Static("🦾 Inference & Testing", classes="screen-title"),
            Static("Interactive chat and model testing", classes="screen-subtitle"),
            Container(
                Container(
                    Label("Model Selection"),
                    Select(
                        [("No model loaded", "none")],
                        id="model-select",
                        value="none",
                    ),
                    Container(
                        Button("Load Model", id="load-model", variant="primary"),
                        Button("Unload", id="unload-model", variant="default"),
                        Button("Gradio UI", id="gradio-ui", variant="success"),
                    ),
                    Container(
                        Static("No model loaded", id="model-status"),
                        classes="model-info",
                    ),
                    Label("Inference Parameters"),
                    Container(
                        Label("Temperature:"),
                        Input(value="0.7", id="temperature"),
                        Label("Max Tokens:"),
                        Input(value="256", id="max-tokens"),
                        Label("Top P:"),
                        Input(value="0.9", id="top-p"),
                    ),
                    classes="model-selector",
                ),
                Container(
                    Container(
                        Log(id="chat-history"),
                        classes="chat-history",
                    ),
                    Container(
                        TextArea(
                            id="message-input",
                        ),
                        classes="input-area",
                    ),
                    Container(
                        Button("Send [Ctrl+Enter]", id="send", variant="primary"),
                        Button("Clear Chat", id="clear", variant="warning"),
                        Button("Save Chat", id="save-chat", variant="default"),
                        Button("Load Examples", id="load-examples", variant="default"),
                        classes="chat-controls",
                    ),
                    classes="chat-interface",
                ),
                classes="inference-container",
            ),
            id="content",
        )
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        self.load_available_models()
        chat = self.query_one("#chat-history", Log)
        chat.write_line("💬 Welcome to Axolotl Inference!")
        chat.write_line("Load a model to start chatting.")
    @work(thread=True)
    async def load_available_models(self) -> None:
        """Load list of available models."""
        models = [("No model loaded", "none")]
        chat = self.query_one("#chat-history", Log)
        chat.write_line("🔍 Scanning for available models...")
        # Check for trained models
        outputs_dir = Path("./outputs")
        chat.write_line(f"Checking outputs directory: {outputs_dir.absolute()}")
        if outputs_dir.exists():
            found_models = 0
            for model_dir in outputs_dir.glob("*"):
                if model_dir.is_dir():
                    # Look for various model file types
                    model_files = (
                        list(model_dir.glob("pytorch_model.bin"))
                        + list(model_dir.glob("model.safetensors"))
                        + list(model_dir.glob("*.bin"))
                        + list(model_dir.glob("*.safetensors"))
                    )
                    if model_files:
                        models.append((model_dir.name, str(model_dir)))
                        found_models += 1
            chat.write_line(f"Found {found_models} trained models in outputs/")
        else:
            chat.write_line("outputs/ directory not found")
        # Add some example/demo models for testing
        models.extend(
            [
                ("Demo: GPT-2 Small", "gpt2"),
                ("Demo: TinyLlama", "TinyLlama/TinyLlama-1.1B-Chat-v1.0"),
                ("Demo: Phi-2", "microsoft/phi-2"),
            ]
        )
        select = self.query_one("#model-select", Select)
        select.set_options(models)
        chat.write_line(f"✅ Loaded {len(models)} models in dropdown")
    @on(Button.Pressed, "#load-model")
    @work(thread=True)
    async def handle_load_model(self) -> None:
        """Load selected model for inference."""
        select = self.query_one("#model-select", Select)
        if select.value == "none":
            return
        chat = self.query_one("#chat-history", Log)
        chat.write_line(f"🔄 Loading model: {select.value}")
        status = self.query_one("#model-status", Static)
        status.update("Loading...")
        try:
            # Simulate model loading (in real implementation, would load the actual model)
            import time
            time.sleep(2)  # Simulate loading time
            self.loaded_model = select.value
            status.update(f"✅ Loaded: {Path(select.value).name}")
            chat.write_line("✅ Model loaded successfully!")
            chat.write_line("You can now start chatting.")
        except Exception as e:
            status.update("❌ Failed to load")
            chat.write_line(f"❌ Failed to load model: {str(e)}")
    @on(Button.Pressed, "#send")
    async def handle_send_message(self) -> None:
        """Send message to model."""
        if not self.loaded_model:
            chat = self.query_one("#chat-history", Log)
            chat.write_line("⚠️ Please load a model first")
            return
        message_input = self.query_one("#message-input", TextArea)
        message = message_input.text.strip()
        if not message:
            return
        # Add user message to chat
        chat = self.query_one("#chat-history", Log)
        chat.write_line(f"👤 User: {message}")
        # Clear input
        message_input.clear()
        # Add to history
        self.chat_history.append({"role": "user", "content": message})
        # Generate response (placeholder)
        self.generate_response(message)
    @on(TextArea.Changed, "#message-input")
    def on_message_input_changed(self, event: TextArea.Changed) -> None:
        """Handle changes to the message input."""
        # This could be used for features like typing indicators
        pass
    def on_key(self, event: events.Key) -> None:
        """Handle key events globally."""
        # Check if we're focused on the message input and Ctrl+Enter is pressed
        focused = self.focused
        if focused and focused.id == "message-input" and event.key == "ctrl+enter":
            event.prevent_default()
            self.handle_send_message()
    @work(thread=True)
    async def generate_response(self, message: str) -> None:
        """Generate model response."""
        chat = self.query_one("#chat-history", Log)
        chat.write_line("🤖 Assistant: Thinking...")
        try:
            # Get inference parameters
            float(self.query_one("#temperature", Input).value)
            int(self.query_one("#max-tokens", Input).value)
            float(self.query_one("#top-p", Input).value)
            if not self.loaded_model or self.loaded_model == "none":
                response = "I don't have a model loaded yet. Please load a model first using the 'Load Model' button."
            elif self.loaded_model.startswith("gpt2"):
                # Simple response for GPT-2
                responses = [
                    f"Thanks for your message: '{message}'. I'm a GPT-2 model running in demo mode.",
                    "I understand you're testing the interface. GPT-2 models are great for experimentation!",
                    "This is a simulated GPT-2 response. In a real setup, I'd generate text based on your input.",
                    f"GPT-2 here! You said: '{message}'. I'd normally continue this conversation creatively.",
                ]
                import random
                response = random.choice(responses)
            elif "llama" in self.loaded_model.lower():
                # Response for Llama models
                response = f"🦙 LLaMA model here! You asked: '{message}'. I'm designed for helpful, harmless, and honest conversations. How can I assist you today?"
            elif "phi" in self.loaded_model.lower():
                # Response for Phi models
                response = f"Phi model responding! Your message: '{message}'. I'm optimized for reasoning and code tasks. What would you like to explore?"
            else:
                # Generic response for other models
                response = f"Model '{self.loaded_model}' responding to: '{message}'. I'm ready to help with your questions!"
            # Simulate inference time
            import time
            time.sleep(0.5)
            # Clear the "thinking" message and show response
            chat.write_line(f"🤖 Assistant: {response}")
            # Add to history
            self.chat_history.append({"role": "assistant", "content": response})
        except Exception as e:
            chat.write_line(f"❌ Error generating response: {str(e)}")
    @on(Button.Pressed, "#clear")
    def handle_clear_chat(self) -> None:
        """Clear chat history."""
        chat = self.query_one("#chat-history", Log)
        chat.clear()
        self.chat_history = []
        chat.write_line("💬 Chat cleared. Start a new conversation!")
    @on(Button.Pressed, "#save-chat")
    def handle_save_chat(self) -> None:
        """Save chat history to file."""
        if not self.chat_history:
            chat = self.query_one("#chat-history", Log)
            chat.write_line("⚠️ No chat history to save")
            return
        try:
            import json
            from datetime import datetime
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"chat_history_{timestamp}.json"
            with open(filename, "w") as f:
                json.dump(self.chat_history, f, indent=2)
            chat = self.query_one("#chat-history", Log)
            chat.write_line(f"💾 Chat saved to {filename}")
        except Exception as e:
            chat = self.query_one("#chat-history", Log)
            chat.write_line(f"❌ Error saving chat: {str(e)}")
    @on(Button.Pressed, "#load-examples")
    def handle_load_examples(self) -> None:
        """Load example prompts."""
        examples = [
            "Explain the concept of machine learning in simple terms.",
            "Write a Python function to calculate fibonacci numbers.",
            "What are the benefits of fine-tuning language models?",
            "Describe the difference between supervised and unsupervised learning.",
        ]
        chat = self.query_one("#chat-history", Log)
        chat.write_line("📚 Example prompts:")
        for i, example in enumerate(examples, 1):
            chat.write_line(f"{i}. {example}")
        chat.write_line("Copy and paste any example to try it out!")
    @on(Button.Pressed, "#gradio-ui")
    @work(thread=True)
    async def handle_gradio_ui(self) -> None:
        """Launch Gradio web interface."""
        chat = self.query_one("#chat-history", Log)
        chat.write_line("🌐 Launching Gradio web interface...")
        try:
            import subprocess
            if self.loaded_model:
                cmd = [
                    "python",
                    "-m",
                    "axolotl.cli.inference",
                    self.loaded_model,
                    "--gradio",
                ]
            else:
                chat.write_line("⚠️ No model loaded. Loading default interface...")
                cmd = ["python", "-m", "axolotl.cli.inference", "--gradio"]
            subprocess.Popen(cmd)
            chat.write_line("✅ Gradio interface launched! Check your browser.")
        except Exception as e:
            chat.write_line(f"❌ Error launching Gradio: {str(e)}")
    @on(Button.Pressed, "#unload-model")
    def handle_unload_model(self) -> None:
        """Unload current model."""
        self.loaded_model = None
        status = self.query_one("#model-status", Static)
        status.update("No model loaded")
        select = self.query_one("#model-select", Select)
        select.value = "none"
        chat = self.query_one("#chat-history", Log)
        chat.write_line("🔄 Model unloaded")
    def action_send_message(self) -> None:
        """Send message action."""
        self.handle_send_message()
    def action_clear_chat(self) -> None:
        """Clear chat action."""
        self.handle_clear_chat()
    def action_load_model(self) -> None:
        """Load model action."""
        self.handle_load_model()
    def action_save_chat(self) -> None:
        """Save chat action."""
        self.handle_save_chat()
--- a/src/axolotl/tui/screens/models.py
+++ b/src/axolotl/tui/screens/models.py
@@ -0,0 +1,373 @@
 """Model management screen for Axolotl TUI."""
 from pathlib import Path
 from typing import Dict, Optional
 from textual import on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container, ScrollableContainer
 from textual.widgets import (
    Button,
    DataTable,
    Footer,
    Header,
    Label,
    Log,
    ProgressBar,
    Static,
    TabbedContent,
    TabPane,
 )
 from axolotl.tui.screens.base import BaseScreen
 class ModelScreen(BaseScreen):
    """Model management screen."""
    BINDINGS = [
        Binding("ctrl+m", "merge_lora", "Merge LoRA"),
        Binding("ctrl+q", "quantize", "Quantize"),
        Binding("ctrl+e", "evaluate", "Evaluate"),
        Binding("r", "refresh", "Refresh"),
    ]
    CSS = """
    .model-container {
        layout: horizontal;
        height: 100%;
    }
    .model-list {
        width: 50%;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .model-operations {
        width: 50%;
        border: solid $secondary;
        padding: 1;
        margin: 1;
    }
    .model-actions {
        layout: horizontal;
        height: 4;
        align: center middle;
        padding: 1;
    }
    .model-actions Button {
        margin: 0 1;
    }
    DataTable {
        height: 80%;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    """
    def __init__(self):
        """Initialize the model screen."""
        super().__init__(
            title="Model Management",
            subtitle="Manage trained models, merge LoRA adapters, and quantize models",
        )
        self.models: Dict[str, Dict] = {}
        self.selected_model: Optional[str] = None
    def compose(self) -> ComposeResult:
        """Compose the model screen layout."""
        yield Header()
        with Container(id="content"):
            yield Static("🦾 Model Management", classes="screen-title")
            yield Static(
                "Manage trained models, merge LoRA adapters, and quantize models",
                classes="screen-subtitle",
            )
            with Container(classes="model-container"):
                with Container(classes="model-list"):
                    yield Label("Available Models")
                    yield DataTable(id="model-table")
                    with Container(classes="model-actions"):
                        yield Button("Merge LoRA", id="merge-lora", variant="primary")
                        yield Button("Quantize", id="quantize", variant="success")
                        yield Button("Evaluate", id="evaluate", variant="warning")
                        yield Button("Refresh", id="refresh", variant="default")
                with Container(classes="model-operations"):
                    with TabbedContent():
                        with TabPane("Operations"):
                            with Container():
                                yield Log(id="operations-log")
                                with Container():
                                    yield Label("Operation Progress:")
                                    yield ProgressBar(
                                        total=100,
                                        id="operation-progress",
                                    )
                        with TabPane("Model Info"):
                            with ScrollableContainer():
                                yield Static(
                                    "Model information will appear here",
                                    id="model-info",
                                )
        yield Footer()
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        self.setup_model_table()
        self.load_models()
        log = self.query_one("#operations-log", Log)
        log.write_line("Model manager ready.")
    def setup_model_table(self) -> None:
        """Setup the model table."""
        table = self.query_one("#model-table", DataTable)
        table.add_columns("Name", "Type", "Size", "Status")
        table.cursor_type = "row"
        table.zebra_stripes = True
    @work(thread=True)
    async def load_models(self) -> None:
        """Load available models."""
        # Check outputs directory for trained models
        outputs_dir = Path("./outputs")
        if outputs_dir.exists():
            for model_dir in outputs_dir.glob("*"):
                if model_dir.is_dir():
                    self.models[model_dir.name] = {
                        "name": model_dir.name,
                        "path": str(model_dir),
                        "type": "checkpoint",
                        "size": self.get_dir_size(model_dir),
                        "status": "available",
                    }
        self.refresh_model_table()
    def get_dir_size(self, path: Path) -> str:
        """Get human-readable directory size."""
        try:
            total_size = sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
            for unit in ["B", "KB", "MB", "GB"]:
                if total_size < 1024.0:
                    return f"{total_size:.2f} {unit}"
                total_size /= 1024.0
            return f"{total_size:.2f} TB"
        except Exception:
            return "Unknown"
    def refresh_model_table(self) -> None:
        """Refresh the model table."""
        table = self.query_one("#model-table", DataTable)
        table.clear()
        for name, info in self.models.items():
            table.add_row(
                name[:30],
                info["type"],
                info["size"],
                info["status"],
            )
    @on(DataTable.RowSelected)
    def handle_model_selected(self, event: DataTable.RowSelected) -> None:
        """Handle model selection from table."""
        if event.cursor_row >= 0:
            model_names = list(self.models.keys())
            if event.cursor_row < len(model_names):
                self.selected_model = model_names[event.cursor_row]
                self.update_model_info()
    def update_model_info(self) -> None:
        """Update model information display."""
        if not self.selected_model:
            return
        info = self.models[self.selected_model]
        info_text = f"""
 Model Name: {info['name']}
 Path: {info['path']}
 Type: {info['type']}
 Size: {info['size']}
 Status: {info['status']}
        """
        self.query_one("#model-info", Static).update(info_text)
    @on(Button.Pressed, "#merge-lora")
    @work(thread=True)
    async def handle_merge_lora(self) -> None:
        """Merge LoRA adapters with base model."""
        if not self.selected_model:
            log = self.query_one("#operations-log", Log)
            log.write_line("⚠️ No model selected")
            return
        model_info = self.models[self.selected_model]
        log = self.query_one("#operations-log", Log)
        log.clear()
        log.write_line(f"🔄 Merging LoRA adapters for {self.selected_model}...")
        progress = self.query_one("#operation-progress", ProgressBar)
        progress.update(progress=0)
        try:
            import subprocess
            cmd = ["python", "-m", "axolotl.cli.merge_lora", model_info["path"]]
            process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
            )
            for line in process.stdout:
                log.write_line(line.strip())
                progress.advance(10)
            process.wait()
            if process.returncode == 0:
                log.write_line("✅ LoRA merge completed successfully!")
                progress.update(progress=100)
            else:
                log.write_line(f"❌ LoRA merge failed with code {process.returncode}")
        except Exception as e:
            log.write_line(f"❌ Error during LoRA merge: {str(e)}")
    @on(Button.Pressed, "#quantize")
    @work(thread=True)
    async def handle_quantize(self) -> None:
        """Quantize selected model."""
        if not self.selected_model:
            log = self.query_one("#operations-log", Log)
            log.write_line("⚠️ No model selected")
            return
        model_info = self.models[self.selected_model]
        log = self.query_one("#operations-log", Log)
        log.clear()
        log.write_line(f"🔄 Quantizing {self.selected_model}...")
        progress = self.query_one("#operation-progress", ProgressBar)
        progress.update(progress=0)
        try:
            import subprocess
            cmd = [
                "python",
                "-m",
                "axolotl.cli.quantize",
                model_info["path"],
                "--output-dir",
                f"{model_info['path']}_quantized",
            ]
            process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
            )
            for line in process.stdout:
                log.write_line(line.strip())
                progress.advance(5)
            process.wait()
            if process.returncode == 0:
                log.write_line("✅ Quantization completed successfully!")
                progress.update(progress=100)
            else:
                log.write_line(f"❌ Quantization failed with code {process.returncode}")
        except Exception as e:
            log.write_line(f"❌ Error during quantization: {str(e)}")
    @on(Button.Pressed, "#evaluate")
    @work(thread=True)
    async def handle_evaluate(self) -> None:
        """Evaluate selected model."""
        if not self.selected_model:
            log = self.query_one("#operations-log", Log)
            log.write_line("⚠️ No model selected")
            return
        model_info = self.models[self.selected_model]
        log = self.query_one("#operations-log", Log)
        log.clear()
        log.write_line(f"🔄 Evaluating {self.selected_model}...")
        progress = self.query_one("#operation-progress", ProgressBar)
        progress.update(progress=0)
        try:
            import subprocess
            cmd = ["python", "-m", "axolotl.cli.evaluate", model_info["path"]]
            process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
            )
            for line in process.stdout:
                log.write_line(line.strip())
                progress.advance(10)
            process.wait()
            if process.returncode == 0:
                log.write_line("✅ Evaluation completed successfully!")
                progress.update(progress=100)
            else:
                log.write_line(f"❌ Evaluation failed with code {process.returncode}")
        except Exception as e:
            log.write_line(f"❌ Error during evaluation: {str(e)}")
    @on(Button.Pressed, "#refresh")
    def handle_refresh(self) -> None:
        """Refresh model list."""
        self.load_models()
    def action_merge_lora(self) -> None:
        """Merge LoRA adapters."""
        self.handle_merge_lora()
    def action_quantize(self) -> None:
        """Quantize model."""
        self.handle_quantize()
    def action_evaluate(self) -> None:
        """Evaluate model."""
        self.handle_evaluate()
    def action_refresh(self) -> None:
        """Refresh model list."""
        self.handle_refresh()
--- a/src/axolotl/tui/screens/monitor.py
+++ b/src/axolotl/tui/screens/monitor.py
@@ -0,0 +1,414 @@
 """System monitoring screen for Axolotl TUI."""
 import psutil
 from textual import on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.widgets import (
    Button,
    DataTable,
    Footer,
    Header,
    Label,
    Log,
    ProgressBar,
    Sparkline,
    Static,
 )
 from axolotl.tui.screens.base import BaseScreen
 class MonitorScreen(BaseScreen):
    """System monitoring screen."""
    BINDINGS = [
        Binding("r", "refresh", "Refresh"),
        Binding("ctrl+k", "kill_process", "Kill Process"),
    ]
    CSS = """
    .monitor-container {
        layout: vertical;
        height: 100%;
    }
    .metrics-grid {
        layout: horizontal;
        height: 20%;
        padding: 1;
    }
    .metric-card {
        width: 25%;
        border: solid $surface;
        padding: 1;
        margin: 0 1;
    }
    .metric-label {
        text-style: bold;
        color: $text-muted;
        text-align: center;
    }
    .metric-value {
        text-style: bold;
        text-align: center;
        padding: 1;
    }
    .charts-container {
        height: 40%;
        layout: horizontal;
        padding: 1;
    }
    .chart-panel {
        width: 50%;
        border: solid $primary;
        padding: 1;
        margin: 0 1;
    }
    .processes-container {
        height: 40%;
        border: solid $warning;
        padding: 1;
        margin: 1;
    }
    DataTable {
        height: 90%;
    }
    .process-controls {
        layout: horizontal;
        height: 4;
        align: center middle;
        padding: 1;
    }
    .process-controls Button {
        margin: 0 1;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    Sparkline {
        height: 8;
    }
    ProgressBar {
        margin: 1 0;
    }
    """
    def __init__(self):
        """Initialize the monitor screen."""
        super().__init__(
            title="System Monitor",
            subtitle="Monitor system resources and running processes",
        )
        self.cpu_history = []
        self.memory_history = []
        self.gpu_history = []
    def compose(self) -> ComposeResult:
        """Compose the monitor screen layout."""
        yield Header()
        yield Container(
            Static("🦾 System Monitor", classes="screen-title"),
            Static(
                "Monitor system resources and running processes",
                classes="screen-subtitle",
            ),
            Container(
                Container(
                    Container(
                        Static("CPU Usage", classes="metric-label"),
                        Static("0%", id="cpu-usage", classes="metric-value"),
                        ProgressBar(total=100, id="cpu-progress"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("Memory", classes="metric-label"),
                        Static("0%", id="memory-usage", classes="metric-value"),
                        ProgressBar(total=100, id="memory-progress"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("GPU Usage", classes="metric-label"),
                        Static("0%", id="gpu-usage", classes="metric-value"),
                        ProgressBar(total=100, id="gpu-progress"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("Temperature", classes="metric-label"),
                        Static("0°C", id="temperature", classes="metric-value"),
                        classes="metric-card",
                    ),
                    classes="metrics-grid",
                ),
                Container(
                    Container(
                        Label("CPU History"),
                        Sparkline([], id="cpu-sparkline"),
                        classes="chart-panel",
                    ),
                    Container(
                        Label("Memory History"),
                        Sparkline([], id="memory-sparkline"),
                        classes="chart-panel",
                    ),
                    classes="charts-container",
                ),
                Container(
                    DataTable(id="process-table"),
                    Log(id="gpu-info"),
                    Log(id="system-logs"),
                    classes="processes-container",
                ),
                classes="monitor-container",
            ),
            id="content",
        )
        yield Footer()
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        self.setup_process_table()
        self.start_monitoring()
        # Initial system info
        self.update_system_info()
        self.update_gpu_info()
    def setup_process_table(self) -> None:
        """Setup the process table."""
        table = self.query_one("#process-table", DataTable)
        table.add_columns("PID", "Name", "CPU%", "Memory%", "Status")
        table.cursor_type = "row"
        table.zebra_stripes = True
    def start_monitoring(self) -> None:
        """Start the monitoring timer."""
        self.set_interval(2.0, self.update_system_metrics)
    @work(thread=True)
    async def update_system_metrics(self) -> None:
        """Update system metrics."""
        try:
            # CPU usage
            cpu_percent = psutil.cpu_percent(interval=None)
            self.cpu_history.append(cpu_percent)
            if len(self.cpu_history) > 50:
                self.cpu_history.pop(0)
            # Memory usage
            memory = psutil.virtual_memory()
            memory_percent = memory.percent
            self.memory_history.append(memory_percent)
            if len(self.memory_history) > 50:
                self.memory_history.pop(0)
            # GPU usage (if available)
            gpu_percent = self.get_gpu_usage()
            self.gpu_history.append(gpu_percent)
            if len(self.gpu_history) > 50:
                self.gpu_history.pop(0)
            # Temperature
            temperature = self.get_temperature()
            # Update UI
            self.update_metrics_display(
                cpu_percent, memory_percent, gpu_percent, temperature
            )
            self.update_sparklines()
            self.update_process_table()
        except Exception as e:
            log = self.query_one("#system-logs", Log)
            log.write_line(f"Error updating metrics: {str(e)}")
    def get_gpu_usage(self) -> float:
        """Get GPU usage percentage."""
        try:
            import pynvml
            pynvml.nvmlInit()
            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
            util = pynvml.nvmlDeviceGetUtilizationRates(handle)
            return util.gpu
        except Exception:
            return 0.0
    def get_temperature(self) -> str:
        """Get system temperature."""
        try:
            temps = psutil.sensors_temperatures()
            if temps:
                for name, entries in temps.items():
                    if entries:
                        return f"{entries[0].current:.1f}°C"
            return "N/A"
        except Exception:
            return "N/A"
    def update_metrics_display(
        self, cpu: float, memory: float, gpu: float, temp: str
    ) -> None:
        """Update metrics display."""
        self.query_one("#cpu-usage", Static).update(f"{cpu:.1f}%")
        self.query_one("#memory-usage", Static).update(f"{memory:.1f}%")
        self.query_one("#gpu-usage", Static).update(f"{gpu:.1f}%")
        self.query_one("#temperature", Static).update(temp)
        self.query_one("#cpu-progress", ProgressBar).update(progress=cpu)
        self.query_one("#memory-progress", ProgressBar).update(progress=memory)
        self.query_one("#gpu-progress", ProgressBar).update(progress=gpu)
    def update_sparklines(self) -> None:
        """Update sparkline charts."""
        if self.cpu_history:
            cpu_sparkline = self.query_one("#cpu-sparkline", Sparkline)
            cpu_sparkline.data = self.cpu_history
        if self.memory_history:
            memory_sparkline = self.query_one("#memory-sparkline", Sparkline)
            memory_sparkline.data = self.memory_history
    def update_process_table(self) -> None:
        """Update the process table."""
        table = self.query_one("#process-table", DataTable)
        table.clear()
        try:
            # Get top processes by CPU usage
            processes = []
            for proc in psutil.process_iter(
                ["pid", "name", "cpu_percent", "memory_percent", "status"]
            ):
                try:
                    pinfo = proc.info
                    if pinfo["cpu_percent"] > 0.1:  # Only show processes using CPU
                        processes.append(pinfo)
                except (psutil.NoSuchProcess, psutil.AccessDenied):
                    pass
            # Sort by CPU usage
            processes.sort(key=lambda x: x["cpu_percent"], reverse=True)
            # Add top 20 processes
            for proc in processes[:20]:
                table.add_row(
                    str(proc["pid"]),
                    proc["name"][:20],
                    f"{proc['cpu_percent']:.1f}%",
                    f"{proc['memory_percent']:.1f}%",
                    proc["status"],
                )
        except Exception as e:
            log = self.query_one("#system-logs", Log)
            log.write_line(f"Error updating process table: {str(e)}")
    def update_system_info(self) -> None:
        """Update system information."""
        try:
            # System info
            psutil.boot_time()
            cpu_count = psutil.cpu_count()
            memory = psutil.virtual_memory()
            log = self.query_one("#system-logs", Log)
            log.write_line(f"System started. CPU cores: {cpu_count}")
            log.write_line(f"Total memory: {memory.total / (1024**3):.1f} GB")
            log.write_line(f"Available memory: {memory.available / (1024**3):.1f} GB")
        except Exception as e:
            log = self.query_one("#system-logs", Log)
            log.write_line(f"Error getting system info: {str(e)}")
    def update_gpu_info(self) -> None:
        """Update GPU information."""
        try:
            import pynvml
            pynvml.nvmlInit()
            device_count = pynvml.nvmlDeviceGetCount()
            log = self.query_one("#gpu-info", Log)
            log.clear()
            log.write_line(f"Found {device_count} GPU(s)")
            for i in range(device_count):
                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
                name = pynvml.nvmlDeviceGetName(handle).decode()
                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
                log.write_line(f"\nGPU {i}: {name}")
                log.write_line(
                    f"Memory: {memory_info.used / (1024**3):.1f} / {memory_info.total / (1024**3):.1f} GB"
                )
                log.write_line(f"Free: {memory_info.free / (1024**3):.1f} GB")
        except Exception as e:
            log = self.query_one("#gpu-info", Log)
            log.clear()
            log.write_line(f"GPU info unavailable: {str(e)}")
    @on(Button.Pressed, "#kill-process")
    def handle_kill_process(self) -> None:
        """Kill selected process."""
        table = self.query_one("#process-table", DataTable)
        if table.cursor_row >= 0:
            try:
                row = table.get_row_at(table.cursor_row)
                pid = int(row[0])
                process = psutil.Process(pid)
                process.terminate()
                log = self.query_one("#system-logs", Log)
                log.write_line(f"Terminated process {pid}")
            except Exception as e:
                log = self.query_one("#system-logs", Log)
                log.write_line(f"Error killing process: {str(e)}")
    @on(Button.Pressed, "#refresh")
    def handle_refresh(self) -> None:
        """Refresh all metrics."""
        self.update_system_info()
        self.update_gpu_info()
        log = self.query_one("#system-logs", Log)
        log.write_line("Metrics refreshed")
    @on(Button.Pressed, "#auto-refresh")
    def handle_auto_refresh(self) -> None:
        """Toggle auto refresh."""
        log = self.query_one("#system-logs", Log)
        log.write_line("Auto refresh is always enabled (every 2 seconds)")
    def action_refresh(self) -> None:
        """Refresh action."""
        self.handle_refresh()
    def action_kill_process(self) -> None:
        """Kill process action."""
        self.handle_kill_process()
--- a/src/axolotl/tui/screens/training.py
+++ b/src/axolotl/tui/screens/training.py
@@ -0,0 +1,545 @@
 """Training management screen for Axolotl TUI."""
 import subprocess
 import threading
 from dataclasses import dataclass
 from datetime import datetime
 from pathlib import Path
 from typing import Dict, List, Optional
 from textual import on, work
 from textual.app import ComposeResult
 from textual.binding import Binding
 from textual.containers import Container
 from textual.widgets import (
    Button,
    DataTable,
    Footer,
    Header,
    Label,
    Log,
    Sparkline,
    Static,
 )
 from axolotl.tui.screens.base import BaseScreen
@dataclass
 class TrainingJob:
    """Represents a training job."""
    id: str
    config_path: str
    status: str  # pending, running, completed, failed
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    process: Optional[subprocess.Popen] = None
    log_file: Optional[str] = None
    current_epoch: int = 0
    total_epochs: int = 0
    current_loss: float = 0.0
    losses: List[float] = None
    def __post_init__(self):
        if self.losses is None:
            self.losses = []
 class TrainingScreen(BaseScreen):
    """Training management screen."""
    BINDINGS = [
        Binding("ctrl+t", "new_training", "New Training"),
        Binding("ctrl+r", "resume_training", "Resume"),
        Binding("ctrl+x", "stop_training", "Stop"),
        Binding("ctrl+l", "view_logs", "View Logs"),
        Binding("r", "refresh", "Refresh"),
    ]
    CSS = """
    .training-container {
        layout: vertical;
        height: 100%;
    }
    .job-list-container {
        height: 40%;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .job-details-container {
        height: 60%;
        padding: 1;
    }
    .control-panel {
        layout: horizontal;
        height: 4;
        align: center middle;
        padding: 1;
        border: solid $secondary;
        margin: 1;
    }
    .control-panel Button {
        margin: 0 1;
    }
    .metrics-panel {
        layout: horizontal;
        height: 10;
        border: solid $primary;
        padding: 1;
        margin: 1;
    }
    .metric-card {
        width: 25%;
        border: tall $surface;
        padding: 1;
        margin: 0 1;
    }
    .metric-label {
        text-style: bold;
        color: $text-muted;
    }
    .metric-value {
        text-style: bold;
        text-align: center;
        padding: 1;
    }
    .log-viewer {
        border: solid $warning;
        padding: 1;
        margin: 1;
    }
    #training-logs {
        height: 100%;
    }
    DataTable {
        height: 100%;
    }
    .screen-title {
        text-align: center;
        text-style: bold;
        padding: 1;
        color: $primary;
    }
    .screen-subtitle {
        text-align: center;
        padding: 0 0 1 0;
        color: $text-muted;
    }
    .sparkline-container {
        height: 5;
        border: solid $success;
        padding: 1;
        margin: 1;
    }
    """
    def __init__(self):
        """Initialize the training screen."""
        super().__init__(
            title="Training Management",
            subtitle="Launch, monitor, and manage training jobs",
        )
        self.jobs: Dict[str, TrainingJob] = {}
        self.selected_job_id: Optional[str] = None
        self.update_timer = None
    def compose(self) -> ComposeResult:
        """Compose the training screen layout."""
        yield Header()
        yield Container(
            Static("🦾 Training Management", classes="screen-title"),
            Static(
                "Launch, monitor, and manage training jobs", classes="screen-subtitle"
            ),
            Container(
                Container(
                    Label("Active Training Jobs"),
                    DataTable(id="job-table"),
                    classes="job-list-container",
                ),
                Container(
                    Button("New Training", id="new-training", variant="primary"),
                    Button("Resume", id="resume-training", variant="success"),
                    Button("Stop", id="stop-training", variant="error"),
                    Button("View Logs", id="view-logs", variant="default"),
                    Button("Clear Completed", id="clear-completed", variant="warning"),
                    Button("Refresh", id="refresh", variant="default"),
                    classes="control-panel",
                ),
                Container(
                    Container(
                        Static("Current Epoch", classes="metric-label"),
                        Static("0 / 0", id="epoch-metric", classes="metric-value"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("Loss", classes="metric-label"),
                        Static("0.000", id="loss-metric", classes="metric-value"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("Status", classes="metric-label"),
                        Static("Idle", id="status-metric", classes="metric-value"),
                        classes="metric-card",
                    ),
                    Container(
                        Static("Duration", classes="metric-label"),
                        Static(
                            "00:00:00", id="duration-metric", classes="metric-value"
                        ),
                        classes="metric-card",
                    ),
                    classes="metrics-panel",
                ),
                Container(
                    Label("Loss History"),
                    Sparkline(
                        [],
                        id="loss-sparkline",
                        summary_function=min,
                    ),
                    classes="sparkline-container",
                ),
                Container(
                    Log(id="training-logs"),
                    classes="log-viewer",
                ),
                classes="job-details-container",
            ),
            classes="training-container",
            id="content",
        )
        yield Footer()
    def on_mount(self) -> None:
        """Called when the screen is mounted."""
        self.setup_job_table()
        self.start_update_timer()
        log = self.query_one("#training-logs", Log)
        log.write_line(
            "Training manager ready. Select a configuration to start training."
        )
    def setup_job_table(self) -> None:
        """Setup the job table."""
        table = self.query_one("#job-table", DataTable)
        table.add_columns("ID", "Config", "Status", "Epoch", "Loss", "Duration")
        table.cursor_type = "row"
        table.zebra_stripes = True
    def start_update_timer(self) -> None:
        """Start the periodic update timer."""
        self.set_interval(2.0, self.update_job_status)
    @work(thread=True)
    async def update_job_status(self) -> None:
        """Update job status periodically."""
        for job_id, job in self.jobs.items():
            if job.status == "running" and job.process:
                poll = job.process.poll()
                if poll is not None:
                    if poll == 0:
                        job.status = "completed"
                    else:
                        job.status = "failed"
                    job.end_time = datetime.now()
        self.refresh_job_table()
        self.update_selected_job_metrics()
    def refresh_job_table(self) -> None:
        """Refresh the job table."""
        table = self.query_one("#job-table", DataTable)
        table.clear()
        for job_id, job in self.jobs.items():
            duration = self.calculate_duration(job)
            table.add_row(
                job_id[:8],
                Path(job.config_path).name,
                job.status,
                f"{job.current_epoch}/{job.total_epochs}",
                f"{job.current_loss:.4f}" if job.current_loss else "N/A",
                duration,
            )
    def calculate_duration(self, job: TrainingJob) -> str:
        """Calculate job duration."""
        if not job.start_time:
            return "00:00:00"
        end_time = job.end_time or datetime.now()
        duration = end_time - job.start_time
        hours = int(duration.total_seconds() // 3600)
        minutes = int((duration.total_seconds() % 3600) // 60)
        seconds = int(duration.total_seconds() % 60)
        return f"{hours:02d}:{minutes:02d}:{seconds:02d}"
    def update_selected_job_metrics(self) -> None:
        """Update metrics for selected job."""
        if not self.selected_job_id or self.selected_job_id not in self.jobs:
            return
        job = self.jobs[self.selected_job_id]
        self.query_one("#epoch-metric", Static).update(
            f"{job.current_epoch} / {job.total_epochs}"
        )
        self.query_one("#loss-metric", Static).update(
            f"{job.current_loss:.4f}" if job.current_loss else "N/A"
        )
        self.query_one("#status-metric", Static).update(job.status.upper())
        self.query_one("#duration-metric", Static).update(self.calculate_duration(job))
        if job.losses:
            sparkline = self.query_one("#loss-sparkline", Sparkline)
            sparkline.data = job.losses[-50:]  # Show last 50 loss values
    @on(DataTable.RowSelected)
    def handle_row_selected(self, event: DataTable.RowSelected) -> None:
        """Handle job selection from table."""
        if event.cursor_row >= 0:
            job_ids = list(self.jobs.keys())
            if event.cursor_row < len(job_ids):
                self.selected_job_id = job_ids[event.cursor_row]
                self.update_selected_job_metrics()
                self.load_job_logs()
    def load_job_logs(self) -> None:
        """Load logs for selected job."""
        if not self.selected_job_id or self.selected_job_id not in self.jobs:
            return
        job = self.jobs[self.selected_job_id]
        if job.log_file and Path(job.log_file).exists():
            try:
                with open(job.log_file, "r") as f:
                    content = f.read()
                    log = self.query_one("#training-logs", Log)
                    log.clear()
                    for line in content.split("\n")[-100:]:  # Show last 100 lines
                        if line.strip():
                            log.write_line(line)
            except Exception as e:
                log = self.query_one("#training-logs", Log)
                log.write_line(f"Error loading logs: {str(e)}")
    @on(Button.Pressed, "#new-training")
    async def handle_new_training(self) -> None:
        """Start a new training job."""
        from axolotl.tui.dialogs.training import NewTrainingDialog
        dialog = NewTrainingDialog()
        result = await self.app.push_screen_wait(dialog)
        if result and "config_path" in result:
            await self.start_training_job(
                result["config_path"], result.get("launcher", "accelerate")
            )
    @work(thread=True)
    async def start_training_job(
        self, config_path: str, launcher: str = "accelerate"
    ) -> None:
        """Start a training job."""
        import uuid
        from datetime import datetime
        job_id = str(uuid.uuid4())
        log_file = f"/tmp/axolotl_training_{job_id}.log"
        job = TrainingJob(
            id=job_id,
            config_path=config_path,
            status="pending",
            start_time=datetime.now(),
            log_file=log_file,
            total_epochs=3,  # Default, should parse from config
        )
        self.jobs[job_id] = job
        self.selected_job_id = job_id
        log = self.query_one("#training-logs", Log)
        log.clear()
        log.write_line(f"🚀 Starting training job {job_id[:8]}...")
        log.write_line(f"Config: {config_path}")
        log.write_line(f"Launcher: {launcher}")
        try:
            if launcher == "accelerate":
                cmd = ["accelerate", "launch", "-m", "axolotl.cli.train", config_path]
            else:
                cmd = [
                    "torchrun",
                    "--nproc_per_node=1",
                    "-m",
                    "axolotl.cli.train",
                    config_path,
                ]
            with open(log_file, "w") as f:
                process = subprocess.Popen(
                    cmd,
                    stdout=f,
                    stderr=subprocess.STDOUT,
                    text=True,
                    bufsize=1,
                )
            job.process = process
            job.status = "running"
            log.write_line("✅ Training started successfully!")
            self.refresh_job_table()
            self.monitor_training_output(job_id)
        except Exception as e:
            job.status = "failed"
            job.end_time = datetime.now()
            log.write_line(f"❌ Failed to start training: {str(e)}")
            self.refresh_job_table()
    def monitor_training_output(self, job_id: str) -> None:
        """Monitor training output and extract metrics."""
        if job_id not in self.jobs:
            return
        job = self.jobs[job_id]
        if not job.log_file:
            return
        def tail_log():
            import re
            import time
            with open(job.log_file, "r") as f:
                f.seek(0, 2)  # Go to end of file
                while job.status == "running":
                    line = f.readline()
                    if line:
                        # Parse training metrics from log
                        epoch_match = re.search(r"Epoch (\d+)/(\d+)", line)
                        if epoch_match:
                            job.current_epoch = int(epoch_match.group(1))
                            job.total_epochs = int(epoch_match.group(2))
                        loss_match = re.search(
                            r"loss['\"]?\s*:\s*([\d.]+)", line, re.IGNORECASE
                        )
                        if loss_match:
                            job.current_loss = float(loss_match.group(1))
                            job.losses.append(job.current_loss)
                        # Update log viewer
                        self.call_from_thread(self.append_training_log, line.strip())
                    else:
                        time.sleep(0.5)
        thread = threading.Thread(target=tail_log, daemon=True)
        thread.start()
    def append_training_log(self, line: str) -> None:
        """Append line to training log."""
        log = self.query_one("#training-logs", Log)
        log.write_line(line)
    @on(Button.Pressed, "#stop-training")
    def handle_stop_training(self) -> None:
        """Stop selected training job."""
        if not self.selected_job_id or self.selected_job_id not in self.jobs:
            log = self.query_one("#training-logs", Log)
            log.write_line("⚠️ No job selected")
            return
        job = self.jobs[self.selected_job_id]
        if job.status == "running" and job.process:
            job.process.terminate()
            job.status = "stopped"
            job.end_time = datetime.now()
            log = self.query_one("#training-logs", Log)
            log.write_line(f"🛑 Training job {job.id[:8]} stopped")
            self.refresh_job_table()
    @on(Button.Pressed, "#resume-training")
    async def handle_resume_training(self) -> None:
        """Resume a stopped training job."""
        if not self.selected_job_id or self.selected_job_id not in self.jobs:
            log = self.query_one("#training-logs", Log)
            log.write_line("⚠️ No job selected")
            return
        job = self.jobs[self.selected_job_id]
        if job.status in ["stopped", "failed"]:
            await self.start_training_job(job.config_path)
    @on(Button.Pressed, "#clear-completed")
    def handle_clear_completed(self) -> None:
        """Clear completed jobs from the list."""
        completed_jobs = [
            job_id
            for job_id, job in self.jobs.items()
            if job.status in ["completed", "failed", "stopped"]
        ]
        for job_id in completed_jobs:
            del self.jobs[job_id]
        self.refresh_job_table()
        log = self.query_one("#training-logs", Log)
        log.write_line(f"🧹 Cleared {len(completed_jobs)} completed jobs")
    @on(Button.Pressed, "#refresh")
    def handle_refresh(self) -> None:
        """Refresh the job list and metrics."""
        self.refresh_job_table()
        self.update_selected_job_metrics()
        if self.selected_job_id:
            self.load_job_logs()
    @on(Button.Pressed, "#view-logs")
    def handle_view_logs(self) -> None:
        """View full logs for selected job."""
        if not self.selected_job_id or self.selected_job_id not in self.jobs:
            return
        job = self.jobs[self.selected_job_id]
        if job.log_file and Path(job.log_file).exists():
            import subprocess
            subprocess.run(["less", job.log_file])
    def action_new_training(self) -> None:
        """Start a new training job."""
        self.handle_new_training()
    def action_stop_training(self) -> None:
        """Stop selected training job."""
        self.handle_stop_training()
    def action_resume_training(self) -> None:
        """Resume selected training job."""
        self.handle_resume_training()
    def action_refresh(self) -> None:
        """Refresh the display."""
        self.handle_refresh()
--- a/src/axolotl/utils/collators/mm_chat.py
+++ b/src/axolotl/utils/collators/mm_chat.py
@@ -5,7 +5,6 @@ Collators for multi-modal chat messages and packing
 from dataclasses import dataclass
 from typing import Any, Optional, Union
 import torch
 from torch import Tensor
 from transformers import PreTrainedTokenizerBase
 from transformers.data.data_collator import DataCollatorMixin
@@ -42,62 +41,19 @@ class MultiModalChatDataCollator(DataCollatorMixin):
        examples = self.processing_strategy(examples)
        # Initialize batch
-        batch: dict[str, Any] = {}
+        messages = [ex["messages"] for ex in examples]
-        # Process each example
+        batch = self.processing_strategy.processor.apply_chat_template(
-        for example in examples:
+            messages,
-            # Apply chat template to process the example
+            add_generation_prompt=False,
-            # This method requires transformers>=4.49.0
+            tokenize=True,
-            result = self.processing_strategy.processor.apply_chat_template(
+            return_tensors="pt",
-                example["messages"],
+            padding=True,
-                add_generation_prompt=False,
+            return_dict=True,
-                tokenize=True,
+            chat_template=self.processing_strategy.chat_template,
                return_tensors="pt",
                padding=True,
                return_dict=True,
                chat_template=self.processing_strategy.chat_template,
            )
            # TODO: Check if need handling for len(input_ids) > sequence_len
            # Add the processed tensors to our batch
            for key in result.keys():
                if key not in batch:
                    batch[key] = []
                batch[key].append(result[key].squeeze(0))
        # Pad sequences to the same length
        input_ids = torch.nn.utils.rnn.pad_sequence(
            batch["input_ids"],
            batch_first=True,
            padding_value=self.tokenizer.pad_token_id,
        )
        attention_mask = torch.nn.utils.rnn.pad_sequence(
            batch["attention_mask"], batch_first=True, padding_value=0
        )
        # Create the final batch
        final_batch = {
            "input_ids": input_ids,
            "attention_mask": attention_mask,
        }
        for key, val in batch.items():
            if key in ["input_ids", "attention_mask"]:
                continue
            if key in ["token_type_ids", "cross_attention_mask"]:
                final_batch[key] = torch.nn.utils.rnn.pad_sequence(
                    val, batch_first=True, padding_value=0
                )
            else:
                final_batch[key] = torch.stack(val)
        # Process the labels
-        final_batch["labels"] = self.processing_strategy.process_labels(
+        batch["labels"] = self.processing_strategy.process_labels(batch["input_ids"])
            final_batch["input_ids"]
        )
-        return final_batch
+        return batch
--- a/src/axolotl/utils/data/sft.py
+++ b/src/axolotl/utils/data/sft.py
@@ -28,7 +28,7 @@ from axolotl.utils.data.shared import (
 )
 from axolotl.utils.data.utils import (
    deduplicate_and_log_datasets,
-    drop_long_seq_in_dataset,
+    handle_long_seq_in_dataset,
    retry_on_request_exceptions,
 )
 from axolotl.utils.data.wrappers import get_dataset_wrapper
@@ -339,9 +339,9 @@ def _load_raw_datasets(
    if not cfg.skip_prepare_dataset:
        if split == "test" and cfg.eval_sequence_len:
-            dataset = drop_long_seq_in_dataset(dataset, cfg.eval_sequence_len, cfg)
+            dataset = handle_long_seq_in_dataset(dataset, cfg.eval_sequence_len, cfg)
        else:
-            dataset = drop_long_seq_in_dataset(dataset, cfg.sequence_len, cfg)
+            dataset = handle_long_seq_in_dataset(dataset, cfg.sequence_len, cfg)
        if cfg.sample_packing:
            dataset, _ = process_datasets_for_packing(cfg, dataset, None)
--- a/src/axolotl/utils/data/utils.py
+++ b/src/axolotl/utils/data/utils.py
@@ -148,7 +148,36 @@ def deduplicate_and_log_datasets(
    return dataset, other_dataset
-def drop_long_seq_in_dataset(
+def truncate_long_seq(sample, sequence_len=2048, min_sequence_len=2):
    """
    Truncate samples whose sequence length is too long (> sequence_len)
    or drop those too short (< min_sequence_len).
    """
    min_sequence_len = min_sequence_len or 2
    input_ids = sample["input_ids"]
    results = []
    # Batched (input_ids is a list of lists)
    for i, seq in enumerate(input_ids):
        length = len(seq)
        if length < min_sequence_len:
            results.append(False)
        elif length > sequence_len:
            sample["input_ids"][i] = seq[:sequence_len]
            if "attention_mask" in sample:
                sample["attention_mask"][i] = sample["attention_mask"][i][:sequence_len]
            if "labels" in sample:
                sample["labels"][i] = sample["labels"][i][:sequence_len]
            if "position_ids" in sample:
                sample["position_ids"][i] = sample["position_ids"][i][:sequence_len]
            results.append(True)
        else:
            results.append(True)
    return results
 def handle_long_seq_in_dataset(
    dataset: Dataset, sequence_len: int, cfg: DictDefault
 ) -> Dataset:
    """Remove sequences longer than configured maximum from dataset.
@@ -192,8 +221,21 @@ def drop_long_seq_in_dataset(
    if filter_map_kwargs:
        drop_long_kwargs["desc"] = f"Dropping Long Sequences (>{sequence_len})"
    excess_length_strategy = (cfg.excess_length_strategy or "drop").lower()
    if excess_length_strategy == "truncate":
        process_fn = functools.partial(
            truncate_long_seq,
            sequence_len=sequence_len,
            min_sequence_len=cfg.min_sample_len,
        )
        drop_long_kwargs["desc"] = (
            f"Truncating/Filtering Sequences (target_len={sequence_len})"
        )
    else:
        process_fn = drop_long
    dataset = dataset.filter(
-        drop_long,
+        process_fn,
        batched=True,
        **filter_map_kwargs,
        **drop_long_kwargs,
@@ -201,6 +243,11 @@ def drop_long_seq_in_dataset(
    if prior_len:
        dropped = prior_len - len(dataset)
        if dropped:
-            LOG.warning(f"Dropped {dropped} long samples from dataset")
+            action = (
                "truncated/filtered"
                if excess_length_strategy == "truncate"
                else "dropped"
            )
            LOG.warning(f"{action.title()} {dropped} samples from dataset")
    return dataset
--- a/src/axolotl/utils/schemas/config.py
+++ b/src/axolotl/utils/schemas/config.py
@@ -414,6 +414,12 @@ class AxolotlInputConfig(
            "description": "The maximum length of an input to train with, this should typically be less than 2048 as most models have a token/context limit of 2048"
        },
    )
    excess_length_strategy: Literal["drop", "truncate"] | None = Field(
        default=None,
        json_schema_extra={
            "description": "What to do when a tokenized row exceeds sequence_len. 'drop' removes the row; 'truncate' slices tensors to sequence_len. Defaults to 'drop' for backward compatibility."
        },
    )
    eval_sequence_len: int | None = Field(
        default=None,
        json_schema_extra={
--- a/src/axolotl/utils/schemas/validation.py
+++ b/src/axolotl/utils/schemas/validation.py
@@ -3,6 +3,7 @@
 # pylint: disable=too-many-boolean-expressions
 import json
 import sys
 import tempfile
 from pathlib import Path
@@ -369,10 +370,10 @@ class TrainingValidationMixin:
                "see speed improvements. Please consider setting `torch_compile: "
                "true` in your config."
            )
        fsdp_config = data.get("fsdp_config") or {}
        if data.get("fp8") and (
-            data.get("fsdp_config", {}).get("activation_checkpointing", False) is True
+            fsdp_config.get("activation_checkpointing", False) is True
-            or data.get("fsdp_config", {}).get("fsdp_activation_checkpointing", False)
+            or fsdp_config.get("fsdp_activation_checkpointing", False) is True
            is True
        ):
            LOG.warning(
                "FP8 + FSDP2 + activation checkpointing may be slower than BF16 "
@@ -817,13 +818,13 @@ class OptimizationValidationMixin:
    @model_validator(mode="before")
    @classmethod
    def check_fsdp_version_in_fsdp_config(cls, data):
-        if data.get("fsdp_config"):
+        fsdp_config = data.get("fsdp_config") or {}
-            if data.get("fsdp_config", {}).get("fsdp_version"):
+        if fsdp_config and fsdp_config.get("fsdp_version"):
-                LOG.warning(
+            LOG.warning(
-                    "Configuring `fsdp_version` in `fsdp_config` is deprecated. "
+                "Configuring `fsdp_version` in `fsdp_config` is deprecated. "
-                    "Please configure `fsdp_version` as a top-level field."
+                "Please configure `fsdp_version` as a top-level field."
-                )
+            )
-                data["fsdp_version"] = data.get("fsdp_config").pop("fsdp_version")
+            data["fsdp_version"] = fsdp_config.pop("fsdp_version")
        return data
    @model_validator(mode="before")
@@ -1151,10 +1152,8 @@ class ModelCompatibilityValidationMixin:
    @classmethod
    def check_gpt_oss_fsdp_loading(cls, data):
        if data.get("model_quantization_config", "") == "Mxfp4Config":
-            if (
+            fsdp_config = data.get("fsdp_config") or {}
-                data.get("fsdp_config", {}).get("cpu_ram_efficient_loading", False)
+            if fsdp_config.get("cpu_ram_efficient_loading", False) is True:
                is True
            ):
                raise ValueError(
                    "FSDP cpu_ram_efficient_loading is not supported for Mxfp4Config model quantization."
                )
@@ -1251,10 +1250,26 @@ class ComplexValidationMixin:
            try:
                import transformers.modeling_flash_attention_utils
                from transformers.utils import is_flash_attn_greater_or_equal
                # pylint: disable=protected-access
-                transformers.modeling_flash_attention_utils._flash_supports_window_size = (
+                transformers.modeling_flash_attention_utils._flash_supports_window = (
-                    transformers.modeling_flash_attention_utils._flash_supports_window
+                    True
                )
                setattr(
                    sys.modules["transformers.modeling_flash_attention_utils"],
                    "_flash_supports_window",
                    True,
                )
                setattr(
                    sys.modules["transformers.modeling_flash_attention_utils"],
                    "_flash_supports_window_size",
                    True,
                )
                setattr(
                    sys.modules["transformers.modeling_flash_attention_utils"],
                    "is_flash_attn_greater_or_equal",
                    is_flash_attn_greater_or_equal,
                )
                import ring_flash_attn  # noqa: F401 # pylint:disable=unused-import
            except ImportError as exception:
--- a/src/axolotl/utils/train.py
+++ b/src/axolotl/utils/train.py
@@ -0,0 +1,45 @@
 """Training utils for checkpoints"""
 from pathlib import Path
 from axolotl.utils.dict import DictDefault
 from axolotl.utils.logging import get_logger
 LOG = get_logger(__name__)
 def determine_last_checkpoint(cfg: DictDefault, update: bool = True) -> str | None:
    """
    Determine the checkpoint to resume from based on configuration.
    Args:
        cfg: Dictionary mapping `axolotl` config keys to values.
        update: Whether to update the config with the determined checkpoint
    Returns:
        Path to the checkpoint to resume from, or `None` if not resuming.
    """
    last_checkpoint = None
    checkpoints = sorted(
        (
            p
            for p in Path(cfg.output_dir).glob("checkpoint-*")
            if p.name.split("-")[-1].isdigit()
        ),
        key=lambda p: int(p.name.split("-")[-1]),
    )
    if checkpoints:
        last_checkpoint = str(checkpoints[-1])
        if not update:
            return last_checkpoint
    if (
        cfg.resume_from_checkpoint is None
        and cfg.auto_resume_from_checkpoints
        and last_checkpoint is not None
    ):
        cfg.resume_from_checkpoint = last_checkpoint
        LOG.info(
            f"Using Auto-resume functionality to start with checkpoint at {cfg.resume_from_checkpoint}"
        )
    return cfg.resume_from_checkpoint
--- a/tests/e2e/patched/test_fsdp2_qlora.py
+++ b/tests/e2e/patched/test_fsdp2_qlora.py
@@ -1,126 +1,28 @@
-"""Integration tests for FSDP Params4bit patches."""
+"""Integration tests for FSDP2 Params4bit patches."""
 from unittest.mock import Mock, patch
 import bitsandbytes as bnb
 import pytest
 import torch
 from torch.distributed.fsdp._fully_shard._fsdp_param import FSDPParam
 from axolotl.monkeypatch.fsdp2_qlora import (
    apply_bnb_torch_function_patch,
    patched_torch_function,
 )
@pytest.fixture
 def mock_params4bit():
    """Create a mock Params4bit instance with test attributes."""
    mock_instance = Mock()
    mock_instance.requires_grad = True
    mock_instance.quant_state = "test_state"
    mock_instance.blocksize = 128
    mock_instance.compress_statistics = True
    mock_instance.quant_type = "fp4"
    mock_instance.quant_storage = "test_storage"
    mock_instance.module = "test_module"
    mock_instance.bnb_quantized = True
    return mock_instance
 class TestBnbTorchFunctionPatch:
    """Test the Params4bit.__torch_function__ patch."""
    def test_apply_patch(self):
        """Test that the patch can be applied."""
        with patch("bitsandbytes.nn.modules.Params4bit") as mock_cls:
            apply_bnb_torch_function_patch()
            assert hasattr(mock_cls, "__torch_function__")
            assert isinstance(mock_cls.__torch_function__, classmethod)
    # pylint: disable=redefined-outer-name
    def test_torch_chunk_preserves_attributes(self, mock_params4bit):
        """Test that torch.chunk preserves Params4bit attributes."""
        mock_cls = Mock()
        chunks = (torch.tensor([1, 2]), torch.tensor([3, 4]))
        with patch("torch.nn.Parameter.__torch_function__", return_value=chunks):
            result = patched_torch_function(
                mock_cls,
                torch.chunk,
                (type(mock_params4bit),),
                args=(mock_params4bit, 2),
            )
            assert isinstance(result, tuple)
            assert len(result) == 2
            # Check that Params4bit constructor was called with preserved attributes
            assert mock_cls.call_count == 2
            for call in mock_cls.call_args_list:
                kwargs = call[1]
                assert kwargs["requires_grad"] == mock_params4bit.requires_grad
                assert kwargs["quant_state"] == mock_params4bit.quant_state
                assert kwargs["blocksize"] == mock_params4bit.blocksize
    # pylint: disable=redefined-outer-name
    def test_other_functions_fallback(self, mock_params4bit):
        """Test that non-chunk/split functions use Parameter fallback."""
        mock_cls = Mock()
        fallback_result = torch.tensor([5, 6, 7])
        with patch(
            "torch.nn.Parameter.__torch_function__", return_value=fallback_result
        ) as mock_fallback:
            result = patched_torch_function(
                mock_cls, torch.add, (type(mock_params4bit),), args=(mock_params4bit, 1)
            )
            # Should call Parameter.__torch_function__ and return its result
            mock_fallback.assert_called_once()
            assert result is fallback_result
            mock_cls.assert_not_called()
 class TestFSDPPatchIntegration:
    """Test FSDP patch integration."""
    @pytest.mark.integration
-    def test_all_patches_together(self):
+    def test_fsdp2_init_patches(self):
        """Test that all patches can be applied together."""
        from axolotl.monkeypatch.fsdp2_qlora import (
            apply_init_sharded_param_patch,
            apply_init_unsharded_param_patch,
        )
        # Store original methods before patching
        original_torch_function = getattr(
            bnb.nn.modules.Params4bit, "__torch_function__", None
        )
        # pylint: disable=protected-access
        original_init_sharded = FSDPParam._init_sharded_param
        original_init_unsharded = FSDPParam.init_unsharded_param
        # Apply patches
        apply_bnb_torch_function_patch()
        apply_init_sharded_param_patch()
        apply_init_unsharded_param_patch()
        # Verify patches were applied
        current_torch_function = getattr(
            bnb.nn.modules.Params4bit, "__torch_function__", None
        )
        if original_torch_function is not None:
            assert (
                current_torch_function != original_torch_function
            ), "Params4bit.__torch_function__ was not patched"
        else:
            assert (
                current_torch_function is not None
            ), "Params4bit.__torch_function__ was not added"
        # Check that FSDP methods were patched
        assert (
            # pylint: disable=protected-access
            FSDPParam._init_sharded_param
--- a/tests/e2e/utils.py
+++ b/tests/e2e/utils.py
@@ -147,7 +147,11 @@ def require_hopper(test_case):
 def check_tensorboard(
-    temp_run_dir: str, tag: str, lt_val: float, assertion_err: str
+    temp_run_dir: str,
    tag: str,
    lt_val: float,
    assertion_err: str,
    rtol: float = 0.02,
 ) -> None:
    """
    helper function to parse and check tensorboard logs
@@ -157,6 +161,7 @@ def check_tensorboard(
    reader = SummaryReader(event_file)
    df = reader.scalars  # pylint: disable=invalid-name
    df = df[(df.tag == tag)]  # pylint: disable=invalid-name
    lt_val = (1 + rtol) * lt_val
    if "%s" in assertion_err:
        assert df.value.values[-1] < lt_val, assertion_err % df.value.values[-1]
    else:
--- a/tests/test_packed_batch_sampler.py
+++ b/tests/test_packed_batch_sampler.py
@@ -8,7 +8,7 @@ from transformers import AutoTokenizer
 from axolotl.datasets import TokenizedPromptDataset
 from axolotl.prompt_strategies.completion import load
 from axolotl.utils.collators import V2BatchSamplerDataCollatorForSeq2Seq
-from axolotl.utils.data.utils import drop_long_seq_in_dataset
+from axolotl.utils.data.utils import handle_long_seq_in_dataset
 from axolotl.utils.dict import DictDefault
 from axolotl.utils.samplers import MultipackBatchSampler, get_dataset_lengths
@@ -70,7 +70,7 @@ class TestBatchedSamplerPacking:
        )
        train_dataset = concatenate_datasets([dataset_wrapper])
-        train_dataset = drop_long_seq_in_dataset(train_dataset, cfg.sequence_len, cfg)
+        train_dataset = handle_long_seq_in_dataset(train_dataset, cfg.sequence_len, cfg)
        lengths = get_dataset_lengths(train_dataset)
        batch_sampler = MultipackBatchSampler(
--- a/tests/utils/test_train.py
+++ b/tests/utils/test_train.py
@@ -0,0 +1,24 @@
 """test for train checkpoint utils"""
 import os
 from axolotl.utils.dict import DictDefault
 from axolotl.utils.train import determine_last_checkpoint
 def test_determine_last_checkpoint(temp_dir):
    cfg = DictDefault(
        output_dir=temp_dir,
    )
    for cpt_idx in [1, 9, 10, 20]:
        os.makedirs(
            os.path.join(cfg.output_dir, f"checkpoint-{cpt_idx}"), exist_ok=True
        )
    last_checkpoint = determine_last_checkpoint(cfg, update=False)
    assert last_checkpoint == os.path.join(cfg.output_dir, "checkpoint-20")
    cfg.resume_from_checkpoint = None
    cfg.auto_resume_from_checkpoints = True
    determine_last_checkpoint(cfg, update=True)
    assert cfg.resume_from_checkpoint == os.path.join(cfg.output_dir, "checkpoint-20")
Author	SHA1	Message	Date
Dan Saunders	c3e1882de5	progress	2025-08-22 02:43:16 -04:00
Dan Saunders	889b27ecf1	tui	2025-08-22 05:08:02 +00:00
Wing Lian	0fa752e58b	upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )	2025-08-21 15:04:10 -04:00
Dan Saunders	08e517ea48	Update .coderabbit.yaml (#3091 ) [skip ci]	2025-08-20 22:14:13 -04:00
Wing Lian	07fd22f39b	better handling of lora w bias with fsdp2 and handling of files when saving model checkpoint (#3090 )	2025-08-20 15:17:48 -04:00
Wing Lian	06eaf6c448	misc fixes (#3085 )	2025-08-20 08:52:26 -04:00
goggle	050210e637	fix: Sweep runs overwrite each other because output_dir from base config is reused (#3080 ) * refactor: improve output_dir handling in generate_config_files * fix typo * cli: harden sweep output_dir handling with base fallback - Ensure sweep permutations always resolve a valid output_dir - Default to ./model-out if neither permutation nor base config sets output_dir - Append sweepXXXX suffix consistently for each permutation - Prevent Path(None) TypeError and improve robustness of sweep config generation * fix typo * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-19 20:25:20 -04:00
Wing Lian	05cedbfb1e	add baseten info for gpt-oss recipe (#3078 ) * add bsaeten info for gpt-oss recipe * incorporate PR review	2025-08-19 13:30:37 -04:00
VED	c10eb811fa	data_parallel_size in in VllmserveCliArgs (#3074 ) * data_parallel_size in in VllmserveCliArgs * moved to 43	2025-08-18 08:44:37 -04:00
VED	0eef385b1a	[feat] truncation support with excess_length_strategy (#3068 ) [skip ci] * feat:truncation support with excess_len * pre-commit * excess_length_strategy * requested changes * lint * added handle_long_seq_in_dataset in sft * comments improved	2025-08-18 08:39:13 -04:00
Wing Lian	ecbe8b2b61	[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 ) * improve fsdp shard merging * improve logging * update information on merging and inferencing GPT-OSS * cleanup readme * automate cleanup of FSDP prefix * import GRPO only if necessary * only modify config.json on rank0 * merge final checkpoint at end of training * prevent circular import * Fix saving for sharded state dict * devx, move merged to output dir * move import back to top * Fix stuck merge * fix conditionals from pr feedback and add test	2025-08-15 21:25:01 -04:00
Wing Lian	130ef7c51a	Various fixes for VLMs (#3063 ) * fix to not use batch feature indexing * more vlm fixes * use AutoModelForImageTextToText * add example yaml and need num2words for chat template * improve handling of adding image tokens to conversation * add lfm2-vl support * update the lfm readme * fix markdown and add rtol for loss checks * feat: add smolvlm2 processing strat * fix: check for causal-conv1d in lfm models * feat: add docs for lfm2 * feat: add new models and tips to docs * feat: add smolvlm2 docs and remove extra dep * chore: update docs * feat: add video instructions * chore: cleanup * chore: comments * fix: typo * feat: add usage stats * chore: refactor --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-15 10:52:57 -04:00
salman	d1de6f5f3d	Add option to skip slow tests in PRs (#3060 ) [skip ci] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * stop running multigpu [skip-e2e] * should work now [skip-e2e] * reverting [skip-e2e] * testing [skip-e2e] * debug [skip-e2e] * debug [skip-e2e] * round 2[skip-e2e] * removing debug [skip-e2e] * support skipping whole PR [skip-e2e] * use script for e2e skip [skip-e2e] * contributing [skip-e2e] * contributing [skip-e2e] --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-13 22:57:51 -04:00
Wing Lian	48b7ae1677	use updated patch releasE (#3066 )	2025-08-13 21:23:05 -04:00
NanoCode012	506e3a3907	fix: fsdp_config validation being None (#3061 ) [skip ci] * fix: fsdp_config validation being None * fix: handling --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-13 21:21:50 -04:00
Wing Lian	09145de8fa	upgrade transformers==4.55.1 and bitsandbytes==0.47.0 (#3064 ) * upgrade transformers==4.55.1 * also upgrade bnb * remove bnb params4bit patch (upstreamed) * use latest causal-conv1d * fix patching ring-flash-attn with now missing imports --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-08-13 19:41:07 -04:00
Wing Lian	e0a2523a3b	Workaround to unblock docs build in main (#3055 ) Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-08-13 11:39:39 +01:00
		`@@ -0,0 +1 @@`
							`"""Axolotl Terminal User Interface (TUI)."""`