* optimize moe + lora
* more scattermoe optims
* selective dequant
* add correctness unit tests and benchmarks for scattermoe + lora
* handle base+lora split kernel for older moe models
* chore: lint
* fix casting for H200 and B200
* register pressure estimation and pruning for h200/b200
* use soft limit for pruning
* qkv patch for qwen3.5moe
* support text_model for qwen3.5 moe
* nesting of qwen3
* use udpated cce with zero3 support
* Fix decomposed backward for QKV and O projections
eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
The non-root user approach had multiple issues with RunPod
compatibility, sudo PATH handling, and tmux in exec sessions.
Restoring root as the default user for now.
* feat(doc): add links to new features on README
* fix merge error
* remove blurb about older FSDP2 integration
* update blog link
* chore: update cce commit
* feat: update model support into readme
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com>
* chore: lint num spaces
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>
* slurm example and make preprocess play nicely
* start slurm if it init file exists
* remove incorrect comment
* feat: add slurm docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* add kernels for gpt oss models
* add support for gpt-oss
* typo incorrect package
* fix: layout for configs and added wandb/epochs
* add gptoss example w offload and set moe leaf for z3
* add support for Mxfp4Config from yaml
* update yaml to use official model
* fix lora and don't allow triton to go above 3.3.1
* fix lr and tweak vram use
* fix range for triton since pinned wasn't compatible with toch 2.6.0
* update cce with gpt oss patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* add uv tooling for e2e gpu tests
* fixes from PR feedback
* simplify check
* fix env var
* make sure to use uv for other install
* use raw_dockerfile_image
* Fix import
* fix args to experimental dockerfile image call
* use updated modal versions
* bump hf deps
* upgrade liger-kernel too
* install cce from fork for transformers fix
* fix reference to vocab size in gemma3 patch
* use padding_idx instead of pad_token_id
* remove fixed gemma3 patch
* use updated cce fork
* fix local mllama cce patches w docstring
* add test for multipack with trainer setup and fix trainer for trainer refactor upstream
* bump modal version
* guard for iterable datasetS
* mllama model arch layout changed in latest transformers
* fix batch sampler with drop_last
* fix: address upstream vlm changes for lora
* fix: update references to old lora target path
* fix: remove mllama fa2 patch due to upstream fix
* fix: lora kernel patch path for multimodal models
* fix: removed mllama from quarto
* run test for came optim on 2.6.0+
* fix fsdp2 patch and remove deprecated patch
* make sure to set sequence_parallel_degree for grpo
* Add SP test for GRPO
* add sp to grpo config for trainer
* use reward_funcs as kwarg to grpo trainer
* fix the comprehension for reward funcs
* reward funcs already passed in as args
* init sp_group right before training
* fix check for adding models to SP context
* make sure to pass args to super
* upgrade deepspeed
* use updated trl and add reasoning flags for vllm
* patch the worker
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* feat: add config for optional parameters in a chat message
* chore: cleanup
* chore: fix nits and add light docs
* docs: update docs/dataset-formats/conversation.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* feat: configurable message mappings, jinja template analyzer
* chore: handle bradley terry
* docs: update docs
* refactor: change order of mappings, improve message transform
* refactor: make chat awware of property mappings
* chore: remove .python-version
* chore: revert change
* chore: add dataset validation to tests where appropriate
* chore: add dataset validation to tests where appropriate
* chore: clean up handling of ds_cfg
* chore: recursively serialize config
* make sure to use the return value from validate_config
* DefaultDict pickle/unpickle fix
* fix super call for override
* refactor: message fields
* chore: empty commit
* tests: validate config before using
* chore: add config validation to all e2e tests
* chore: add unneeded logging
* chore: add missed config validation
* chore: pass field_messages to prompter
* test: fix borked test
* chore: remove uninteded file
* chore: add deprecation warning and update chat_datasets script
* chore: lint
* refactor: message fields
* feat: update axolotlinputconfig and test_models
- add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py
- remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes
- update model_dump method in axolotlinputconfig to exclude none values
- correct typo in test_models.py comment
* feat: simplify dpodataset and ktodataset classes in config models
removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets.
* feat: improve readability and structure in dataset configuration models
this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file.
* feat: change log level from info to debug in chattemplatestrategy
* feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy
- Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor
- Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor
- Add `messages_array_name` property to `ChatTemplatePrompter`
- Change `processor` type to Optional in `ChatTemplatePrompter`
- Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt`
- Remove `_messages` property from `ChatTemplateStrategy`
- Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor
- Remove `messages` getter and setter from `ChatTemplateStrategy`
- Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread`
- Remove condition to set `messages` field in `load` function
* feat(tests/utils): ignore type check in load_model call in test_models.py
* feat: improve type handling and test structure in chat templates
- Add return type hint for `get_chat_template` function in `chat_templates.py`
- Remove unnecessary assignment of `strategy.messages` in several test cases
- Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py`
- Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py`
* feat(axolotl): enhance chat strategy with datasetconfig support
This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include:
- Importing Union from typing and BaseModel from pydantic.
- Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader.
- Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances.
- Refactoring the prompter_params and strategy_params for better readability.
- Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method.
* feat: update message handling in btchattemplatestrategy
* Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages"
* Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages"
* Add a new attribute "messages_array_name" to the `load` function parameters
* Remove the conditional attribute assignment for "field_messages" in the `load` function
* feat: add config validation in test_kd.py
- Import `validate_config` from `axolotl.utils.config`
- Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class
* feat: enhance config validation and capabilities handling
* Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
* Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)`
* Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump
* feat: update config validation in axolotl utils
- Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
- Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities`
* feat: refactor strategyloader in chat_template.py
- Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)`
- Created a new function, `_get_strategy_cls()`, to obtain the strategy class
- Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation
* trigger CI
* chore: revert dataset config changes for kto/dpo
* subject: refactor: rename 'messages_array_name' to 'field_messages'
Body:
- Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py'
- Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change
- Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name
- Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages'
- Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name'
* feat: refactor prompt strategies and update config models
* Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py`
* Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists
* Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py`
* Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model
* chore: remove unused input
* chore: remove redundant type ignore
* fix: remove old configs and update examples
* fix: type check
* fix: remove loading old config in ChatMessage
* fix: update faq with potential new undefinederror
* fix: add debug if property mapped is not found
* chore: improve explanation for unmapped properties
* fix: update docs with new config
* chore: add note for deprecation config and del old config from dict
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* native support for modal cloud from CLI
* do lm_eval in cloud too
* Fix the sub call to lm-eval
* lm_eval option to not post eval, and append not extend
* cache bust when using branch, grab sha of latest image tag, update lm-eval dep
* allow minimal yaml for lm eval
* include modal in requirements
* update link in README to include utm
* pr feedback
* use chat template
* revision support
* apply chat template as arg
* add wandb name support, allow explicit a100-40gb
* cloud is optional
* handle accidental setting of tasks with a single task str
* document the modal cloud yaml for clarity [skip ci]
* cli docs
* support spawn vs remote for lm-eval
* Add support for additional docker commands in modal image build
* cloud config shouldn't be a dir
* Update README.md
Co-authored-by: Charles Frye <cfrye59@gmail.com>
* fix annotation args
---------
Co-authored-by: Charles Frye <cfrye59@gmail.com>
* transformers 4.47.1
* drop monkeypatches
* can't remove patches yet
* make flash attention forward ignore the loss kwargs
* patch the flash attention in the modeling arch too
* remove fsdp and deepspeed patches
* cleanup PR
* bump accelerate and torchao, also logically reorder/group requirements
* meant to include torchao
* use official patch release
* fix build w pyproject to respect insalled torch version
* include in manifest
* disable duplicate code check for now
* move parser so it can be found
* add checks for correct pytorch version so this doesn't slip by again
* prepare plugins needs to happen so registration can occur to build the plugin args
use yaml.dump
include dataset and more assertions
* attempt to manually register plugins rather than use fn
* fix fixture
* remove fixture
* move cli test to patched dir
* fix cce validation