Commit Graph

1761 Commits

Author SHA1 Message Date
Wing Lian
31723ac523 fix whitespace for patch check 2024-12-06 16:43:44 -05:00
Wing Lian
2e9e423dfd detab the code to check 2024-12-06 16:42:29 -05:00
Wing Lian
cbe61186dc patches for llama ga 2024-12-06 16:40:24 -05:00
Wing Lian
2a83580bdc also bump accelerate 2024-12-06 15:24:57 -05:00
Wing Lian
825f66b9fd update HF HUB env var and fix reward trainer log since it doesn't directly override log 2024-12-06 14:52:59 -05:00
Wing Lian
3b44989205 skip parent, call grandparent - yeah, super janky 2024-12-06 12:19:14 -05:00
Wing Lian
811224d7b7 broken 🦥 with latest transformers 2024-12-06 11:34:06 -05:00
Wing Lian
84a14fc604 fix trl trainer.log interfaces 2024-12-06 10:35:29 -05:00
NanoCode012
86cf62ca46 fix: update trainer.log signature 2024-12-06 10:27:18 -05:00
Wing Lian
fc54e10455 bump transformers and trl 2024-12-06 10:27:12 -05:00
Wing Lian
5e9fa33f3d reduce test concurrency to avoid HF rate limiting, test suite parity (#2128)
* reduce test concurrency to avoid HF rate limiting, test suite parity

* make val_set_size smaller to speed up e2e tests

* more retries for pytest fixture downloads

* val_set_size was too small

* move retry_on_request_exceptions to data utils and add retry strategy

* pre-download ultrafeedback as a test fixture

* refactor download retry into it's own fn

* don't import from data utils

* use retry mechanism now for fixtures
2024-12-06 10:20:20 -05:00
Dan Saunders
08fa133177 Fix broken CLI; remove duplicate metadata from setup.py (#2136)
* Fix broken CLI; remove duplicate metadata from setup.py

* Adding tests.yml CLI check

* updating

* remove test with requests to github due to rate limiting

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
2024-12-06 10:19:54 -05:00
Wing Lian
6b3058b2dc upgrade bnb 0.45.0 and peft 0.14.0 (#2126)
* upgrade bnb to lastest release

* update peft to working supporting commit

* bump to latest release of peft==0.14.0
2024-12-06 09:08:55 -05:00
Wing Lian
5726141c4e remove accidentally included symlink (#2131) 2024-12-05 22:37:19 -05:00
Dan Saunders
2f3ebbc44f auto-versioning and adding axolotl.__version__ (#2127)
* auto-versioning and adding axolotl.__version__

* removing file meant for codecov PR

* adding dynamic dependencies, project metadata

* extras/optional-dependencies are dynamic too

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-12-05 22:12:40 -05:00
Dan Saunders
fc973f4322 CLI Implementation with Click (#2107)
* Initial CLI implementation with click package

* Adding fetch command for pulling examples and deepspeed configs

* Automating default options for CliArgs classes

* Mimicking existing no config behavior

* bugfix in choose_config

* Updating fetch to sync instead of re-download

* bugfix

* isort fix

* fixing yaml isort order

* pre-commit fixes

* simplifying argument parsing -- pass through kwargs to do_cli

* make accelerate launch default for non-preprocess commands

* fixing arg handling

* testing None placeholder approach

* removing hacky --use-gpu argument to preprocess command

* Adding brief README documentation for CLI

* remove (New)

* Initial CLI pytest tests

* progress on CLI pytest

* adding inference CLI tests; cleanup

* Refactor train CLI tests to remove various mocking

* Major CLI test refator; adding remaining CLI codepath test coverage

* pytest fixes

* remove integration markers

* parallelizing examples, deepspeed config downloads; rename test to match other CLI test naming

* moving cli pytest due to isolation issues; cleanup

* testing fixes; various minor improvements

* fix

* tests fix

* Update tests/cli/conftest.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-12-05 22:11:48 -05:00
Wing Lian
e399ba533e fix license header for fix_untrained_tokens from unsloth-zoo (#2129) [skip ci] 2024-12-05 21:20:40 -05:00
Wing Lian
4baf8e5e96 cleanup the readme, add Modal as sponsor (#2130) [skip ci] 2024-12-05 21:19:52 -05:00
Wing Lian
d7d2fd366e update from unsloth-zoo with additional fixes (#2122)
only update tokens seen in the train dataset, log them out explicitly
2024-12-04 12:26:08 -05:00
Wing Lian
e2882dd749 drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings (#2121) [skip ci]
* drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings

* make sure to run tests when cicd Dockerfile changes
2024-12-04 12:25:47 -05:00
Wing Lian
a1790f2652 replace tensorboard checks with helper function (#2120) [skip ci]
* replace tensorboard checks with helper function

* move helper function

* use relative
2024-12-03 21:06:20 -05:00
Wing Lian
418ad2b586 add missing fixture decorator for predownload dataset (#2117) [skip ci]
* add missing fixture decorator for predownload dataset

* also pre download the tokenizer files
2024-12-03 18:08:46 -05:00
Wing Lian
d87df2c776 prepare plugins needs to happen so registration can occur to build the plugin args (#2119)
* prepare plugins needs to happen so registration can occur to build the plugin args

use yaml.dump

include dataset and more assertions

* attempt to manually register plugins rather than use fn

* fix fixture

* remove fixture

* move cli test to patched dir

* fix cce validation
2024-12-03 15:06:09 -05:00
Wing Lian
1ef70312ba fix optimizer reset for relora sft (#1414)
* fix optimizer reset

* set states to reset for 8bit optimizers and handle quantile runtime error for embeddings

* fix relora test to check grad_norm

* use flash attn for relora and tweak hyperparams for test

* fix messages field for test dataset
2024-12-03 08:58:23 -05:00
NanoCode012
81ef3e45f7 fix(readme): update cuda instructions during preprocess (#2114) [skip ci] 2024-12-03 08:58:03 -05:00
NanoCode012
bd8436bc6e feat: add cut_cross_entropy (#2091)
* feat: add cut_cross_entropy

* fix: add to input

* fix: remove from setup.py

* feat: refactor into an integration

* chore: ignore lint

* feat: add test for cce

* fix: set max_steps for liger test

* chore: Update base model following suggestion

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* chore: update special_tokens following suggestion

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* chore: remove with_temp_dir following comments

* fix: plugins aren't loaded

* chore: update quotes in error message

* chore: lint

* chore: lint

* feat: enable FA on test

* chore: refactor get_pytorch_version

* fix: lock cce commit version

* fix: remove subclassing UT

* fix: downcast even if not using FA and config check

* feat: add test to check different attentions

* feat: add install to CI

* chore: refactor to use parametrize for attention

* fix: pytest not detecting test

* feat: handle torch lower than 2.4

* fix args/kwargs to match docs

* use release version cut-cross-entropy==24.11.4

* fix quotes

* fix: use named params for clarity for modal builder

* fix: handle install from pip

* fix: test check only top level module install

* fix: re-add import check

* uninstall existing version if no transformers submodule in cce

* more dataset fixtures into the cache

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-12-03 08:22:22 -05:00
Wing Lian
fc6188cd76 fix merge conflict of duplicate max_steps in config for relora (#2116) 2024-12-03 07:42:41 -05:00
Wing Lian
b9bb02406a fix so inference can be run against quantized models without adapters (#1834)
* fix so inference can be run against quantized models without adapters

* Update error msg [skip e2e]

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-12-03 00:02:38 -05:00
Sunny Liu
ff4794cd8e Add ds model card, rebased (#2101) [skip ci]
* rebased add_ds_model_card

* manual rebasing

* fix redundancy

* lint

* include case when ds_tag is none

* conform to kwargs in create_model_card
2024-12-03 00:02:02 -05:00
NanoCode012
822c904092 fix(vlm): handle legacy conversation data format and check image in data (#2018) [skip ci]
* fix: handle legacy conversation data format and check image in data

* feat: add test for llama vision

* feat: add max_steps to test

* fix: incorrect indent and return preprocess

* feat: use smaller model and dataset

* chore: add extra config for sharegpt dataset
2024-12-03 00:01:31 -05:00
Sunny Liu
d5f58b6509 Check torch version for ADOPT optimizer + integrating new ADOPT updates (#2104)
* added torch check for adopt, wip

* lint

* gonna put torch version checking somewhere else

* added ENVcapabilities class for torch version checking

* lint + pydantic

* ENVCapabilities -> EnvCapabilities

* forgot to git add v0_4_1/__init__.py

* removed redundancy

* add check if env_capabilities not specified

* make env_capabilities compulsory [skip e2e]

* fixup env_capabilities

* modified test_validation.py to accomodate env_capabilities

* adopt torch version test [skip e2e]

* raise error

* test correct torch version

* test torch version above requirement

* Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* removed unused is_totch_min

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-12-02 20:15:39 -05:00
Wing Lian
9f6d0b5587 use pytest sugar and verbose for more info during ci (#2112) [skip ci]
* use pytest sugar and verbose for more info during ci

* also run test suite when test requirements or cicd.sh changes

* also on PR too
2024-12-02 20:14:40 -05:00
Wing Lian
53963c792c make the eval size smaller for the resume test (#2111) [skip ci] 2024-12-02 18:32:29 -05:00
Wing Lian
a4f4a56d77 build causal_conv1d and mamba-ssm into the base image (#2113)
* build causal_conv1d and mamba-ssm into the base image

* also build base images on changes to Dockerfile-base and base workflow yaml
2024-12-02 18:27:46 -05:00
Wing Lian
ce5bcff750 various tests fixes for flakey tests (#2110)
* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests

* log slowest tests

* pin pynvml==11.5.3

* fix load local hub path

* optimize for speed w smaller models and val_set_size

* replace pynvml

* make the resume from checkpoint e2e faster

* make tests smaller
2024-12-02 17:28:58 -05:00
Oliver Molenschot
b620ed94d0 Add Exact Deduplication Feature to Preprocessing Pipeline (#2072)
* Add example YAML file for training Mistral using DPO

* added deduplication code

* Add exact deduplication feature and update examples

* Improve deduplication for train/eval overlap

Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.

* Improve deduplication for train/eval overlap

Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.

* Apply suggestions from code review

To handle the original case where we do not do deduplication

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Improve false collision detection to ensure dataset integrity

- Added test cases to simulate and verify handling of forced hash collisions between datasets.
- Ensured that datasets with identical hashes but different content are correctly identified, preventing incorrect deduplication.
- Updated unit tests to include scenarios where collisions occur across both training and evaluation datasets, as well as within a single dataset.

* Moved the constants file to the tests folder

- Relocated `constants.py` to the `tests` folder to improve modularity and maintain a clear separation between source and test files.
- Renamed `cicd/tests.py` to `cicd/cicd_tests.py` to resolve a conflict with `tests/__init__.py`, which caused Mypy to fail due to duplicate module names.
- Updated all references to `cicd.tests` in the codebase to `cicd.cicd_tests` to reflect the renaming and ensure compatibility.
- These changes ensure Mypy passes the pre-commit hook and maintain alignment with the project's structure.

* revert some changes from previous commit and fix relative import

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-12-02 08:47:10 -05:00
Wing Lian
5f1d98e8fc add e2e tests for Unsloth qlora and test the builds (#2093)
* see if unsloth installs cleanly in ci

* check unsloth install on regular tests, not sdist

* fix ampere check exception for ci

* use cached_property instead

* add an e2e test for unsloth qlora

* reduce seq len and mbsz to prevent oom in ci

* add checks for fp16 and sdp_attention

* pin unsloth to a specific release

* add unsloth to docker image too

* fix flash attn xentropy patch

* fix loss, add check for loss when using fa_xentropy

* fix special tokens for test

* typo

* test fa xentropy with and without gradient accum

* pr feedback changes
2024-11-29 20:38:49 -05:00
Wing Lian
1cf7075d18 support seperate lr for embeddings, similar to loraplus (#1910) [skip ci]
* support seperate lr for embeddings, similar to loraplus

* add test case for train w lr embedding scale

* use kwarg for optimizer

* make sure to handle the optimizer creation

* make sure to handle for embedding_lr too

* use smollm for e2e, check for embeddings lr first before wdecay
2024-11-29 20:38:20 -05:00
NanoCode012
f4cabc2351 fix: ds3 and fsdp lmbench eval (#2102) [ski[p ci]
* fix: ds3 and fsdp lmbench eval

* chore: update comment

* fix: test signature
2024-11-29 20:37:49 -05:00
Wing Lian
6e0fb4a6b2 add finetome dataset to fixtures, check eval_loss in test (#2106) [skip ci]
* add finetome dataset to fixtures, check eval_loss in test

* add qwen 0.5b to pytest session fixture
2024-11-29 20:37:32 -05:00
Wing Lian
724b660d56 move shared pytest conftest to top level tests (#2099) [skip ci]
* move shared pytest conftest to top level tests

* add __init__ so mypy doesn't choke on multiple conftests
2024-11-22 15:05:42 -05:00
Aman Karmani
51c9e1a035 .gitignore improvements (#349) [skip ci] 2024-11-22 11:08:54 -05:00
Sunny Liu
45c0825587 updated colab notebook (#2074)
* updated colab notebook

* update pip installtation

* cleared cell output

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* modified notebook

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update examples/colab-notebooks/colab-axolotl-example.ipynb

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* cleared cell output

* cleared unnecessary logs

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-11-22 10:09:10 -05:00
Wing Lian
94fc223f6c actions/create-release is unmaintained, and doesn't create proper release notes (#2098) [skip ci] 2024-11-21 14:32:41 -05:00
Sunny Liu
151abb7a67 fix None-type not iterable error when deepspeed is left blank w/ use_… (#2087)
* fix None-type not iterable error when deepspeed is left blank w/ use_reentrant: false and qlora

* added unit test[skip e2e]

* corrected test case[skip e2e]

* assert warning message [skip e2e]

* assert warning message [skip e2e]

* corrected test cases [skip e2e]

* lint
2024-11-21 13:36:51 -05:00
Sunny Liu
bf416bdfd0 bump_liger_0.4.2 (#2096) 2024-11-21 13:24:52 -05:00
Mengqing Cao
838b74d05b Add Ascend NPU support (#1758) 2024-11-20 21:28:41 -05:00
Wing Lian
2e99bb303e fix inference when no chat_template is set, fix unsloth dora check (#2092)
* fix inference when no chat_template is set, fix unsloth dora check

* remove old unsloth version check

* update docs on installing unsloth
2024-11-20 14:07:54 -05:00
Chirag Jain
68a26f1005 Fix duplication of plugin callbacks (#2090) 2024-11-20 14:06:08 -05:00
Wing Lian
db51a9e4cb use pep440 instead of semver (#2088) [skip ci] 2024-11-19 15:02:10 -05:00