* only configure logging on cli to play nicely with colab
* allow reloading the config on the fly from a dict
* make sure to use dict for yaml
* reuse existing function for load
* make cli args optional
* mps fix and respect max_steps
* bump transformers and trl
* fix: update trainer.log signature
* fix trl trainer.log interfaces
* broken 🦥 with latest transformers
* skip parent, call grandparent - yeah, super janky
* update HF HUB env var and fix reward trainer log since it doesn't directly override log
* also bump accelerate
* patches for llama ga
* detab the code to check
* fix whitespace for patch check
* play nicely with CI tests since we patch everytime
* fix pop default in case it doesn't exist
* more tweaks to make patches nicer in CI
* fix detab for when there are possibly multiple patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* prepare plugins needs to happen so registration can occur to build the plugin args
use yaml.dump
include dataset and more assertions
* attempt to manually register plugins rather than use fn
* fix fixture
* remove fixture
* move cli test to patched dir
* fix cce validation
* feat: add cut_cross_entropy
* fix: add to input
* fix: remove from setup.py
* feat: refactor into an integration
* chore: ignore lint
* feat: add test for cce
* fix: set max_steps for liger test
* chore: Update base model following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: update special_tokens following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: remove with_temp_dir following comments
* fix: plugins aren't loaded
* chore: update quotes in error message
* chore: lint
* chore: lint
* feat: enable FA on test
* chore: refactor get_pytorch_version
* fix: lock cce commit version
* fix: remove subclassing UT
* fix: downcast even if not using FA and config check
* feat: add test to check different attentions
* feat: add install to CI
* chore: refactor to use parametrize for attention
* fix: pytest not detecting test
* feat: handle torch lower than 2.4
* fix args/kwargs to match docs
* use release version cut-cross-entropy==24.11.4
* fix quotes
* fix: use named params for clarity for modal builder
* fix: handle install from pip
* fix: test check only top level module install
* fix: re-add import check
* uninstall existing version if no transformers submodule in cce
* more dataset fixtures into the cache
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* Add example YAML file for training Mistral using DPO
* added deduplication code
* Add exact deduplication feature and update examples
* Improve deduplication for train/eval overlap
Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.
* Improve deduplication for train/eval overlap
Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.
* Apply suggestions from code review
To handle the original case where we do not do deduplication
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Improve false collision detection to ensure dataset integrity
- Added test cases to simulate and verify handling of forced hash collisions between datasets.
- Ensured that datasets with identical hashes but different content are correctly identified, preventing incorrect deduplication.
- Updated unit tests to include scenarios where collisions occur across both training and evaluation datasets, as well as within a single dataset.
* Moved the constants file to the tests folder
- Relocated `constants.py` to the `tests` folder to improve modularity and maintain a clear separation between source and test files.
- Renamed `cicd/tests.py` to `cicd/cicd_tests.py` to resolve a conflict with `tests/__init__.py`, which caused Mypy to fail due to duplicate module names.
- Updated all references to `cicd.tests` in the codebase to `cicd.cicd_tests` to reflect the renaming and ensure compatibility.
- These changes ensure Mypy passes the pre-commit hook and maintain alignment with the project's structure.
* revert some changes from previous commit and fix relative import
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* Allow using tokenizer's default chat template with fallbacks
Summary of changes:
1. Adds `tokenizer_default` as option for `chat_template` in
`chat_template` prompt strategy that allows using the chat template
from tokenizer's config.json
2. Allows falling back to chat templates available in axolotl if
tokenizer does not have a chat template
3. Adds a mistral chat template which supports system message - taken
from https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja
---
Why?
Many popular models are not trained with chatml format. As a result for
the model to correctly learn chatml we have to turn on train_on_inputs
which requires more compute and time. If we can use the model's already
learned chat template we can just learn the output tokens
---
Todo:
- Write tests
* Add tests
* Fix lint and bug post merge from main
* Add option `chat_template_jinja` to provide a jinja template
* remove custom mistral template
* Address review comments and add docs
* Update docs/dataset-formats/conversation.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* fix: set default to tokenizer template
* Merge branch 'main' into cj_tokenizer_default_prompt_template
* chore: remove redundant function
* fix: re-arrange enum declaration position
* fix: refactor artifact left from main merge
* feat(doc): updated config with chat template options and clarified examples
* chore: clarify doc
* chore: added example for non-default template
* chore: refactor
* fix: test
* fix: config being dropped and unittest to catch that
* chore: lint
* chore: skip duplicate
* fix: rename var after merge
* feat: add test for levy's dpo case
* fix: remove default setting on edge case where chat template overriden in dataset section
* feat: handle sharegpt deprecation better in docs
* feat: add example using fallback
* feat: handles chat_template requiring specific user/assistant order
* fix: update test based on new defaults
* fix: imported name incorrectly updated on merge
* chore: lint
* fix: update dummy message to prevent potential overlap with real content
* fix(doc): formatting
* fix: update bradleyterry to use new chat_template
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
* Add first version of a Comet integration
* Remove debug prints
* Add test for Comet Configuration transformation to env variables
* Fix last lint warning
* Update Readme for Comet logging documentation
* Update Comet integration to be optional, update code and tests
* Add documentation for Comet configuration
* Add missing check
* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
* Gradio Configuration Settings
* Making various Gradio variables configurable instead of hardcoded
* Remove overwriting behavour of 'default tokens' that breaks tokenizer for llama3
* Fix type of gradio_temperature
* revert un-necessary change and lint
---------
Co-authored-by: Marijn Stollenga <stollenga@imfusion.de>
Co-authored-by: Marijn Stollenga <stollenga@imfusion.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests
* WIP conversion to use pydantic for config validation
* wip, more fields, add capabilities
* wip
* update pydantic validation to match existing tests
* tweak requirements
* setup deprecated paams pydantic model
* more validations
* wrap up rest of the validations
* flesh out the rest of the options from the readme into pydantic
* fix model validators as class methods
remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning
* more missing attributes for cfg
* updates from PR feedback
* fix validation for datasets and pretrain datasets
* fix test for lora check
* cleanup dpo to be a little more extensible, add zephyr/nectar strategy
* fix eos slash
* support for eval split
* fix kwargs
* handle empty evals
* don't load peft model for dpo
* ensure dpo traning args gets bf16 for peft if applicable
* fix duplicate kwargs for bf16
* make sure to respect the configured lr scheduler
* supprt trainer callback to push config to wandb
* set dataloader preload args
* ensure that we are loading the lora when merging
* Update src/axolotl/utils/data.py
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* support local datasets for dpo
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* chore: lint
* dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names
* add split to dpo tests
* fix rebase/merging error
* handle edge case w logging
* use accelerator for dpo datasets so it doesn't break the logger
* missing args
* validate checkpoint is an adapter for now
* log warning when dataset strategy is not loadable
---------
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* fix: improved memory handling when model is bigger than existing VRAM
* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)
For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.
* doc: add README
* fix: enable progress bars in do_merge_lora()
* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: remove deletion of removed model_kwargs key
* fix: validate that gpu_memory_limit and max_memory are not both set
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* ipo-dpo trainer
* fix missing abstract method
* chatml template, grad checkpointing kwargs support
* fix steps calc for RL and add dataloader kwargs
* wip to fix dpo and start ppo
* more fixes
* refactor to generalize map fn
* fix dataset loop and handle argilla pref dataset
* set training args
* load reference model on seperate gpu if more than one device
* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo
* fixes for rl training
* support for ipo from yaml
* set dpo training args from the config, add tests
* chore: lint
* set sequence_len for model in test
* add RLHF docs
* Determine FSDP/deepspeed settings on device select.
Without this, the OS env check for accelerate will fail.
* rename and move env setup call
* chore: lint
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>