* feat: LoRA kernel support for bias, dropout, dora, embeddings
* chore: lint
* chore: lint
* address PR feedback, add regression tests, add fsdp2 tests for lora kernels
* update tests for new sigs
* update tests now that bias and dropout are supported
* Add test cases to verify that the problem exists in the underlying
* Update the handle_long_sequences function to correctly use Map instead of filter for the truncation strategy. Also remove the minimal length filtering from the truncate_long_samples function, and run it separately and before.
* fix: refactor and add test truncate for non-input id fields
* fix: refactor long seq handling fn
* fix: refactor duplicate fn and simplify route
* add additional tests and make them work on mac
* handle logging exception on empty datasets
---------
Co-authored-by: 2ndset bot <bot@2ndset.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* Prepare for transformers v5 upgrade
* fix hf cli
* update for hf hub changes
* fix tokenizer apply_chat_template args
* remap include_tokens_per_second
* fix tps
* handle migration for warmup
* use latest hf hub
* Fix scan -> ls
* fix import
* fix for renaming of mistral common tokenizer -> backend
* update for fixed tokenziation for llama
* Skip phi35 tests for now
* remove mistral patch fixed upstream in huggingface/transformers#41439
* use namespacing for patch
* don't rely on sdist for e2e tests for now
* run modal ci without waiting too
* Fix dep for ci
* fix imports
* Fix fp8 check
* fsdp2 fixes
* fix version handling
* update fsdp version tests for new v5 behavior
* Fail multigpu tests after 3 failures
* skip known v5 broken tests for now and cleanup
* bump deps
* unmark skipped test
* re-enable test_fsdp_qlora_prequant_packed test
* increase multigpu ci timeout
* skip broken gemma3 test
* reduce timout back to original 120min now that the hanging test is skipped
* fix for un-necessary collator for pretraining with bsz=1
* fix: safe_serialization deprecated in transformers v5 rc01 (#3318)
* torch_dtype deprecated
* load model in float32 for consistency with tests
* revert some test fixtures back
* use hf cache ls instead of scan
* don't strip fsdp_version
more fdsp_Version fixes for v5
fix version in fsdp_config
fix aliasing
fix fsdp_version check
check fsdp_version is 2 in both places
* Transformers v5 rc2 (#3347)
* bump dep
* use latest fbgemm, grab model config as part of fixture, un-skip test
* import AutoConfig
* don't need more problematic autoconfig when specifying config.json manually
* add fixtures for argilla ultrafeedback datasets
* download phi4-reasoning
* fix arg
* update tests for phi fast tokenizer changes
* use explicit model types for gemma3
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* fix: AutoModelForVision2Seq -> AutoModelForImageTextToText
* chore: remove duplicate
* fix: attempt fix gemma3 text mode
* chore: lint
* ga release of v5
* need property setter for name_or_path for mistral tokenizer
* vllm not compatible with transformers v5
* setter for chat_template w mistral too
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>
* improve fsdp shard merging
* improve logging
* update information on merging and inferencing GPT-OSS
* cleanup readme
* automate cleanup of FSDP prefix
* import GRPO only if necessary
* only modify config.json on rank0
* merge final checkpoint at end of training
* prevent circular import
* Fix saving for sharded state dict
* devx, move merged to output dir
* move import back to top
* Fix stuck merge
* fix conditionals from pr feedback and add test
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai>
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* fix: update chat_template
* fix: handle gemma3 showing a lot of no content for turn 0
* fix: remove unknown config from examples
* fix: test
* fix: temporary disable gemma2 test
* fix: stop overwriting config.text_config unnecessarily
* fix: handling of set cache to the text_config section
* feat: add liger gemma support and bump liger to 0.5.5
* fix: add double use_cache setting
* fix: add support for final_logit_softcap in CCE for gemma2/3
* fix: set use_cache before model load
* feat: add missing layernorm override
* fix: handle gemma3 rmsnorm
* fix: use wrapper to pass dim as hidden_size
* fix: change dim to positional
* fix: patch with wrong mlp
* chore: refactor use_cache handling
* fix import issues
* fix tests.e2e.utils import
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* hf offline decorator for tests to workaround rate limits
* fail quicker so we can see logs
* try new cache name
* limit files downloaded
* phi mini predownload
* offline decorator for phi tokenizer
* handle meta llama 8b offline too
* make sure to return fixtures if they are wrapped too
* more fixes
* more things offline
* more offline things
* fix the env var
* fix the model name
* handle gemma also
* force reload of modules to recheck offline status
* prefetch mistral too
* use reset_sessions so hub picks up offline mode
* more fixes
* rename so it doesn't seem like a context manager
* fix backoff
* switch out tinyshakespeare dataset since it runs a py script to fetch data and doesn't work offline
* include additional dataset
* more fixes
* more fixes
* replace tiny shakespeaere dataset
* skip some tests for now
* use more robust check using snapshot download to determine if a dataset name is on the hub
* typo for skip reason
* use local_files_only
* more fixtures
* remove local only
* use tiny shakespeare as pretrain dataset and streaming can't be offline even if precached
* make sure fixtures aren't offline
improve the offline reset
try bumping version of datasets
reorder reloading and setting
prime a new cache
run the tests now with fresh cache
try with a static cache
* now run all the ci again with hopefully a correct cache
* skip wonky tests for now
* skip wonky tests for now
* handle offline mode for model card creation
* feat: add config for optional parameters in a chat message
* chore: cleanup
* chore: fix nits and add light docs
* docs: update docs/dataset-formats/conversation.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* feat: configurable message mappings, jinja template analyzer
* chore: handle bradley terry
* docs: update docs
* refactor: change order of mappings, improve message transform
* refactor: make chat awware of property mappings
* chore: remove .python-version
* chore: revert change
* chore: add dataset validation to tests where appropriate
* chore: add dataset validation to tests where appropriate
* chore: clean up handling of ds_cfg
* chore: recursively serialize config
* make sure to use the return value from validate_config
* DefaultDict pickle/unpickle fix
* fix super call for override
* refactor: message fields
* chore: empty commit
* tests: validate config before using
* chore: add config validation to all e2e tests
* chore: add unneeded logging
* chore: add missed config validation
* chore: pass field_messages to prompter
* test: fix borked test
* chore: remove uninteded file
* chore: add deprecation warning and update chat_datasets script
* chore: lint
* refactor: message fields
* feat: update axolotlinputconfig and test_models
- add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py
- remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes
- update model_dump method in axolotlinputconfig to exclude none values
- correct typo in test_models.py comment
* feat: simplify dpodataset and ktodataset classes in config models
removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets.
* feat: improve readability and structure in dataset configuration models
this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file.
* feat: change log level from info to debug in chattemplatestrategy
* feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy
- Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor
- Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor
- Add `messages_array_name` property to `ChatTemplatePrompter`
- Change `processor` type to Optional in `ChatTemplatePrompter`
- Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt`
- Remove `_messages` property from `ChatTemplateStrategy`
- Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor
- Remove `messages` getter and setter from `ChatTemplateStrategy`
- Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread`
- Remove condition to set `messages` field in `load` function
* feat(tests/utils): ignore type check in load_model call in test_models.py
* feat: improve type handling and test structure in chat templates
- Add return type hint for `get_chat_template` function in `chat_templates.py`
- Remove unnecessary assignment of `strategy.messages` in several test cases
- Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py`
- Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py`
* feat(axolotl): enhance chat strategy with datasetconfig support
This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include:
- Importing Union from typing and BaseModel from pydantic.
- Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader.
- Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances.
- Refactoring the prompter_params and strategy_params for better readability.
- Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method.
* feat: update message handling in btchattemplatestrategy
* Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages"
* Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages"
* Add a new attribute "messages_array_name" to the `load` function parameters
* Remove the conditional attribute assignment for "field_messages" in the `load` function
* feat: add config validation in test_kd.py
- Import `validate_config` from `axolotl.utils.config`
- Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class
* feat: enhance config validation and capabilities handling
* Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
* Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)`
* Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump
* feat: update config validation in axolotl utils
- Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
- Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities`
* feat: refactor strategyloader in chat_template.py
- Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)`
- Created a new function, `_get_strategy_cls()`, to obtain the strategy class
- Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation
* trigger CI
* chore: revert dataset config changes for kto/dpo
* subject: refactor: rename 'messages_array_name' to 'field_messages'
Body:
- Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py'
- Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change
- Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name
- Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages'
- Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name'
* feat: refactor prompt strategies and update config models
* Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py`
* Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists
* Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py`
* Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model
* chore: remove unused input
* chore: remove redundant type ignore
* fix: remove old configs and update examples
* fix: type check
* fix: remove loading old config in ChatMessage
* fix: update faq with potential new undefinederror
* fix: add debug if property mapped is not found
* chore: improve explanation for unmapped properties
* fix: update docs with new config
* chore: add note for deprecation config and del old config from dict
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>