* EBFT wip
* fixes
* more fixeS
* add missing strided module
* ebft fixes for multi-turn
* make ebft work with async
* add example for ebft w qwen3.5
* fix for split thinking and update yaml for lora over linear attention only
* enforce_eager for vllm arg in schema
* fix sync weights
* fix multi-gpu
* handle updated sig for mm
* ddp fixes
* improve multi-gpu handling, don't calculate logits, adaptive completion length
* chore: lint
* chore: lint
* support completion_mean
* Address corereview feedback
* clamp min IS ratio
* Address PR code review
* more fixes identified
* address code review
* Fix property from rebase conflict
* roundup_power2_divisions not needed with newer pytorch versions
* remove typo
* update qwen3.5 moe 35b-a3b yaml for 5090
* more bug fixes
* fix tests to match updated trainer
* don't use fa2 for hooks test
* reset plugins on the instance
* retry download
* fix references to renamed axolotl_cfg property on trainer
* Fix ref to trainer cfg
* feat: LoRA kernel support for bias, dropout, dora, embeddings
* chore: lint
* chore: lint
* address PR feedback, add regression tests, add fsdp2 tests for lora kernels
* update tests for new sigs
* update tests now that bias and dropout are supported
* fix token state json and mistral tokenizer issue
* centralize constants
* forgot to commit constants file
* Fix weakref in pickling relora state dict
* make curl a bit quieter so it doesn't log 2K lines
* fix path traversal for olmoe test
* more test fixes that weren't flagged previously
* chore: lint
* skip tests that fail b/c of OutOfResources
* scattermoe as slow tests
* update fbgemm-genai for torch 2.10
Transformers 5.x routes attention through sdpa_attention.py and no longer
calls the _prepare_4d_causal_attention_mask* or _expand_mask functions that
these patches targeted. This makes the following patches dead code:
- llama_patch_multipack.py (patched _prepare_4d_causal_attention_mask*)
- llama_expand_mask.py (patched _expand_mask, never called)
- Related utility functions in monkeypatch/utils.py
Closesaxolotl-ai-cloud/axolotl#3331
* optimize moe + lora
* more scattermoe optims
* selective dequant
* add correctness unit tests and benchmarks for scattermoe + lora
* handle base+lora split kernel for older moe models
* chore: lint
* fix casting for H200 and B200
* register pressure estimation and pruning for h200/b200
* use soft limit for pruning
* qkv patch for qwen3.5moe
* support text_model for qwen3.5 moe
* nesting of qwen3
* use udpated cce with zero3 support
* Fix decomposed backward for QKV and O projections
eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
* use custom triton kernels for entropy from logits and selective softmax
* PR comments fixes
* fix out of bounds, include tests, include benchmarks
* chore: lint
* async grpo support
* implement data producer
* use fast async
* handle call to create data producer
* fix liger kernel setup
* fix replay buffer
* chore: lint
* make gpus go brrr
* chore: lint
* inplace div_, unwrap model for logits in bf16
* fuse selective softmax and empty cuda cache on each scoring step
* remove waiting for synch time and fix race
* make fp8 work and allow lora kernels w rl
* grpo with lora vllm sync and fixes for sharded distributed
* update docs
* more patches so it works against trl main
* address PR feedback for corerabbit
* fix: replace shell=True subprocess with argument list in modal CLI
Using shell=True with a formatted string containing docker_image
(a user-controlled value) is a command injection risk (Bandit B602).
Replace with an argument list, which passes args directly to the
process without shell interpretation, removing the nosec annotation.
* fix: add nosec annotation to suppress bandit B603/B607 warnings
Removing shell=True (B602) surfaces B603 (subprocess without shell)
and B607 (partial executable path for 'docker'). Use bare # nosec
to suppress both, consistent with other nosec usages in the codebase.
* consolidate behavioud of routing in scattermoe kernels
* collect telemetry on best chosen autotuned kernel
* properly collect data
* Fix property name and get smem too
* handle issues raised by coderabbit
* add tests for parity before refactoring
* docs: fix codestyle placeholders in CONTRIBUTING.md
Replace unresolved {codestyle} and {URLofCodestyle} template
variables with Ruff, the project's actual linter/formatter
as configured in .pre-commit-config.yaml.
* fix: replace bare except clauses with specific exception types
- quantization.py: use except ImportError for optional torchao imports
(consistent with line 48 which already uses ImportError correctly)
- cli/config.py: use except (RuntimeError, AssertionError) for CUDA
device property query
Prevents masking unrelated errors like KeyboardInterrupt or SystemExit.
* test: add unit tests for convert.py JSON/JSONL utilities
Cover FileReader, FileWriter, StdoutWriter, JsonParser,
JsonlSerializer, and JsonToJsonlConverter with 8 test cases
including roundtrip and edge case (empty list) scenarios.
Previously this module had zero test coverage.
* fix: address CodeRabbit review feedback
- quantization.py: catch (ImportError, RuntimeError) for optional
torchao imports; CUDA wheel/GPU mismatches raise RuntimeError,
not ImportError
- convert.py: remove unused output_file_path parameter from
JsonToJsonlConverter.convert() — FileWriter already holds the
output path from construction
- tests/test_convert.py: update call site to match new signature
* install flash-linear-attention
* handle prequant weights for fsdp2 and ensure loss is not zero
* fix type for cu_seqlen, uninstall causal_conv1d
* chore: lint
* uv pip uninstall doesn't need confirmation
* upgrade transformers==5.3.0 trl==0.29.0 kernels
* use latest deepspeed fixes
* use corect image for cleanup
* fix test outputs for tokenizer fixes upstream
* fix import:
* keep trl at 0.28.0
* handle updated API
* use latest trl since 0.28.0 doesn't work with latest transformers
* use trl experimental for pad to length
* monkeypatch trl with ORPOTrainer so liger doesn't croak
* upgrade accelerate
* more fixes
* move patch for orpotrainer
* load the imports later
* remove use_logits_to_keep
* fix loss_type arg as a list
* fetch hf cache from s3
* just manually download the missing model for now
* lint for pre-commit update
* a few more missing models on disk
* fix: loss_type internally now list
* fix: remove deprecated code and raise deprecate
* fix: remove unneeded blocklist
* fix: remove reliance on transformers api to find package available
* chore: refactor shim for less sideeffect
* fix: silent trl experimental warning
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* mxfp4 axo
* import lint
* test for qat mxfp4
* config for mxfp4
* add qat:
* pass base config
* MXFakeQuantizeConfig
* lint
* tune config so it fits in 32GB VRAM
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* Fix fsdp2 sharding. Fix validation of ao version for lr groups
* remove validation since axolotl requires ao>0.13.0 already
* Move fully_shard of entire module for lora_embedding_A/B out of loop
* chore: lint
---------
Co-authored-by: bekk02 <ID+bekk02@users.noreply.github.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* chore: rename without period
* feat: add glm45 air
* feat: add doc on expert quantization
* feat: update base readme with new changes
* chore: cleanup
* chore: cleanup
* chore: cleanup
* fix: disable quantize_moe_expert on merge per comment
* chore: add kernel info to optimizations doc