* offload activations to disk instead of CPU RAM
* add prefetch
* Disco :dance:
* include offload_disk in e2e test for AC
* document and make sure to cleanup
* fix annotation to match docs
* fix docs build
* address PR feedback
* lean mistral ft tests, remove e2e torch 2.4.1 test
* make sure to pass save_only_model for RL
* more tests to make ci leaner, add cleanup to modal ci
* fix module for import in e2e tests
* use mp spawn to prevent deadlocks with packing
* make sure cleanup shell script is executable when cloned out
* fsdp embeddings should be float32 per comment
* patch peft to not upcast everything
* add tabs back to code check
* fix import
* add configurable option and fix check
* add check for dtypes
* move embeddings test to patch dir
* fix test
* fix comment and logic
* add e2e smoke test for using activation/gradient checkpointing with offload
* disable duplicate code check for the test
* fix relative import
* seq len too small to test this dataset with packing
* Fix checkpoint ptaching for tests
* make sure to validate the config before normalizing so defaults get set
* validation not needed for particular test
* remove duplicate validations
* set qlora correctly
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* removing pad_to_sequence_len=False for now
* fix
* updating docs to include batch SP
* review comments
* fixes for batch API funcs, simplify
* fixes
* fix
* updates
* add batch_zigzag smoke test
* [ci] make e2e tests a bit faster by reducing test split size
* use 10% split of alpaca dataset to speed up dataset loading/tokenization
* reduce gas 4->2 for most e2e tests
* increase val set size for packing
* guard return if ring attn alrady registered
* add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers)
* configurable heads_k_stride from ring-flash-attn hf adapter
* feat: add config for optional parameters in a chat message
* chore: cleanup
* chore: fix nits and add light docs
* docs: update docs/dataset-formats/conversation.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* feat: configurable message mappings, jinja template analyzer
* chore: handle bradley terry
* docs: update docs
* refactor: change order of mappings, improve message transform
* refactor: make chat awware of property mappings
* chore: remove .python-version
* chore: revert change
* chore: add dataset validation to tests where appropriate
* chore: add dataset validation to tests where appropriate
* chore: clean up handling of ds_cfg
* chore: recursively serialize config
* make sure to use the return value from validate_config
* DefaultDict pickle/unpickle fix
* fix super call for override
* refactor: message fields
* chore: empty commit
* tests: validate config before using
* chore: add config validation to all e2e tests
* chore: add unneeded logging
* chore: add missed config validation
* chore: pass field_messages to prompter
* test: fix borked test
* chore: remove uninteded file
* chore: add deprecation warning and update chat_datasets script
* chore: lint
* refactor: message fields
* feat: update axolotlinputconfig and test_models
- add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py
- remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes
- update model_dump method in axolotlinputconfig to exclude none values
- correct typo in test_models.py comment
* feat: simplify dpodataset and ktodataset classes in config models
removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets.
* feat: improve readability and structure in dataset configuration models
this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file.
* feat: change log level from info to debug in chattemplatestrategy
* feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy
- Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor
- Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor
- Add `messages_array_name` property to `ChatTemplatePrompter`
- Change `processor` type to Optional in `ChatTemplatePrompter`
- Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt`
- Remove `_messages` property from `ChatTemplateStrategy`
- Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor
- Remove `messages` getter and setter from `ChatTemplateStrategy`
- Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread`
- Remove condition to set `messages` field in `load` function
* feat(tests/utils): ignore type check in load_model call in test_models.py
* feat: improve type handling and test structure in chat templates
- Add return type hint for `get_chat_template` function in `chat_templates.py`
- Remove unnecessary assignment of `strategy.messages` in several test cases
- Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py`
- Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py`
* feat(axolotl): enhance chat strategy with datasetconfig support
This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include:
- Importing Union from typing and BaseModel from pydantic.
- Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader.
- Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances.
- Refactoring the prompter_params and strategy_params for better readability.
- Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method.
* feat: update message handling in btchattemplatestrategy
* Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages"
* Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages"
* Add a new attribute "messages_array_name" to the `load` function parameters
* Remove the conditional attribute assignment for "field_messages" in the `load` function
* feat: add config validation in test_kd.py
- Import `validate_config` from `axolotl.utils.config`
- Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class
* feat: enhance config validation and capabilities handling
* Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
* Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)`
* Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump
* feat: update config validation in axolotl utils
- Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
- Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities`
* feat: refactor strategyloader in chat_template.py
- Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)`
- Created a new function, `_get_strategy_cls()`, to obtain the strategy class
- Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation
* trigger CI
* chore: revert dataset config changes for kto/dpo
* subject: refactor: rename 'messages_array_name' to 'field_messages'
Body:
- Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py'
- Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change
- Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name
- Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages'
- Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name'
* feat: refactor prompt strategies and update config models
* Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py`
* Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists
* Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py`
* Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model
* chore: remove unused input
* chore: remove redundant type ignore
* fix: remove old configs and update examples
* fix: type check
* fix: remove loading old config in ChatMessage
* fix: update faq with potential new undefinederror
* fix: add debug if property mapped is not found
* chore: improve explanation for unmapped properties
* fix: update docs with new config
* chore: add note for deprecation config and del old config from dict
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
* need to update deepspeed version in extras too
* fix patch import
* fix monkeypatch reloading in tests and deepspeed patch
* remove duplicated functionality fixture
* reset LlamaForCausalLM too in fixtures for cce patch
* reset llama attn too
* disable xformers patch for cce
* skip problematic test on low usage functionality
* bump transformers and trl
* fix: update trainer.log signature
* fix trl trainer.log interfaces
* broken 🦥 with latest transformers
* skip parent, call grandparent - yeah, super janky
* update HF HUB env var and fix reward trainer log since it doesn't directly override log
* also bump accelerate
* patches for llama ga
* detab the code to check
* fix whitespace for patch check
* play nicely with CI tests since we patch everytime
* fix pop default in case it doesn't exist
* more tweaks to make patches nicer in CI
* fix detab for when there are possibly multiple patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* reduce test concurrency to avoid HF rate limiting, test suite parity
* make val_set_size smaller to speed up e2e tests
* more retries for pytest fixture downloads
* val_set_size was too small
* move retry_on_request_exceptions to data utils and add retry strategy
* pre-download ultrafeedback as a test fixture
* refactor download retry into it's own fn
* don't import from data utils
* use retry mechanism now for fixtures
* prepare plugins needs to happen so registration can occur to build the plugin args
use yaml.dump
include dataset and more assertions
* attempt to manually register plugins rather than use fn
* fix fixture
* remove fixture
* move cli test to patched dir
* fix cce validation
* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests
* log slowest tests
* pin pynvml==11.5.3
* fix load local hub path
* optimize for speed w smaller models and val_set_size
* replace pynvml
* make the resume from checkpoint e2e faster
* make tests smaller
* see if unsloth installs cleanly in ci
* check unsloth install on regular tests, not sdist
* fix ampere check exception for ci
* use cached_property instead
* add an e2e test for unsloth qlora
* reduce seq len and mbsz to prevent oom in ci
* add checks for fp16 and sdp_attention
* pin unsloth to a specific release
* add unsloth to docker image too
* fix flash attn xentropy patch
* fix loss, add check for loss when using fa_xentropy
* fix special tokens for test
* typo
* test fa xentropy with and without gradient accum
* pr feedback changes
* feat: support new arg num_items_in_batch
* use kwargs to manage extra unknown kwargs for now
* upgrade against upstream transformers main
* make sure trl is on latest too
* fix for upgraded trl
* fix: handle trl and transformer signature change
* feat: update trl to handle transformer signature
* RewardDataCollatorWithPadding no longer has max_length
* handle updated signature for tokenizer vs processor class
* invert logic for tokenizer vs processor class
* processing_class, not processor class
* also handle processing class in dpo
* handle model name w model card creation
* upgrade transformers and add a loss check test
* fix install of tbparse requirements
* make sure to add tbparse to req
* feat: revert kwarg to positional kwarg to be explicit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* swaps to use newer sample packing for mistral
* fix multipack patch test
* patch the common fa utils
* update for refactor of flash attn unpad
* remove un-needed drop attn mask for mistral
* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2
* update test
* bump flash attention 2.5.8 -> 2.6.1
* use triton implementation of cross entropy from flash attn
* add smoke test for flash attn cross entropy patch
* fix args to xentropy.apply
* handle tuple from triton loss fn
* ensure the patch tests run independently
* use the wrapper already built into flash attn for cross entropy
* mark pytest as forked for patches
* use pytest xdist instead of forked, since cuda doesn't like forking
* limit to 1 process and use dist loadfile for pytest
* change up pytest for fixture to reload transformers w monkeypathc
* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests