axolotl

Author	SHA1	Message	Date
Wing Lian	5e0124e2ab	update modal version for ci (#2242 )	2025-01-09 21:01:02 +00:00
Wing Lian	3c1921e400	add hf cache caching for GHA (#2247 ) * add hf cache caching for GHA * use modal volume to cache hf data * make sure to update the cache as we add new fixtures in conftest	2025-01-09 20:59:54 +00:00
Wing Lian	7faf2b6e8e	Merge group queue (#2248 ) * add support for merge groups * also lint merge groups	2025-01-09 15:49:00 -05:00
Wing Lian	d009ead101	fix build w pyproject to respect insalled torch version (#2168 ) * fix build w pyproject to respect insalled torch version * include in manifest * disable duplicate code check for now * move parser so it can be found * add checks for correct pytorch version so this doesn't slip by again	2024-12-10 16:25:25 -05:00
Wing Lian	0c25bc07a2	use manual version for now (#2156 )	2024-12-08 21:09:12 -05:00
Wing Lian	5e9fa33f3d	reduce test concurrency to avoid HF rate limiting, test suite parity (#2128 ) * reduce test concurrency to avoid HF rate limiting, test suite parity * make val_set_size smaller to speed up e2e tests * more retries for pytest fixture downloads * val_set_size was too small * move retry_on_request_exceptions to data utils and add retry strategy * pre-download ultrafeedback as a test fixture * refactor download retry into it's own fn * don't import from data utils * use retry mechanism now for fixtures	2024-12-06 10:20:20 -05:00
Dan Saunders	08fa133177	Fix broken CLI; remove duplicate metadata from setup.py (#2136 ) * Fix broken CLI; remove duplicate metadata from setup.py * Adding tests.yml CLI check * updating * remove test with requests to github due to rate limiting --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>	2024-12-06 10:19:54 -05:00
Dan Saunders	fc973f4322	CLI Implementation with Click (#2107 ) * Initial CLI implementation with click package * Adding fetch command for pulling examples and deepspeed configs * Automating default options for CliArgs classes * Mimicking existing no config behavior * bugfix in choose_config * Updating fetch to sync instead of re-download * bugfix * isort fix * fixing yaml isort order * pre-commit fixes * simplifying argument parsing -- pass through kwargs to do_cli * make accelerate launch default for non-preprocess commands * fixing arg handling * testing None placeholder approach * removing hacky --use-gpu argument to preprocess command * Adding brief README documentation for CLI * remove (New) * Initial CLI pytest tests * progress on CLI pytest * adding inference CLI tests; cleanup * Refactor train CLI tests to remove various mocking * Major CLI test refator; adding remaining CLI codepath test coverage * pytest fixes * remove integration markers * parallelizing examples, deepspeed config downloads; rename test to match other CLI test naming * moving cli pytest due to isolation issues; cleanup * testing fixes; various minor improvements * fix * tests fix * Update tests/cli/conftest.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Dan Saunders <dan@axolotl.ai> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-12-05 22:11:48 -05:00
Wing Lian	e2882dd749	drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings (#2121 ) [skip ci] * drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings * make sure to run tests when cicd Dockerfile changes	2024-12-04 12:25:47 -05:00
NanoCode012	bd8436bc6e	feat: add cut_cross_entropy (#2091 ) * feat: add cut_cross_entropy * fix: add to input * fix: remove from setup.py * feat: refactor into an integration * chore: ignore lint * feat: add test for cce * fix: set max_steps for liger test * chore: Update base model following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: update special_tokens following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: remove with_temp_dir following comments * fix: plugins aren't loaded * chore: update quotes in error message * chore: lint * chore: lint * feat: enable FA on test * chore: refactor get_pytorch_version * fix: lock cce commit version * fix: remove subclassing UT * fix: downcast even if not using FA and config check * feat: add test to check different attentions * feat: add install to CI * chore: refactor to use parametrize for attention * fix: pytest not detecting test * feat: handle torch lower than 2.4 * fix args/kwargs to match docs * use release version cut-cross-entropy==24.11.4 * fix quotes * fix: use named params for clarity for modal builder * fix: handle install from pip * fix: test check only top level module install * fix: re-add import check * uninstall existing version if no transformers submodule in cce * more dataset fixtures into the cache --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2024-12-03 08:22:22 -05:00
Wing Lian	9f6d0b5587	use pytest sugar and verbose for more info during ci (#2112 ) [skip ci] * use pytest sugar and verbose for more info during ci * also run test suite when test requirements or cicd.sh changes * also on PR too	2024-12-02 20:14:40 -05:00
Wing Lian	5f1d98e8fc	add e2e tests for Unsloth qlora and test the builds (#2093 ) * see if unsloth installs cleanly in ci * check unsloth install on regular tests, not sdist * fix ampere check exception for ci * use cached_property instead * add an e2e test for unsloth qlora * reduce seq len and mbsz to prevent oom in ci * add checks for fp16 and sdp_attention * pin unsloth to a specific release * add unsloth to docker image too * fix flash attn xentropy patch * fix loss, add check for loss when using fa_xentropy * fix special tokens for test * typo * test fa xentropy with and without gradient accum * pr feedback changes	2024-11-29 20:38:49 -05:00
Wing Lian	e9c3a2aec0	add missing dunder-init for monkeypatches and add tests for install from sdist (#2085 ) Some checks failed ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 121, 12.1.1, 3.11, 2.3.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * add missing dunder-init for monkeypatches and add tests for install from sdist * fix gha name * reduce matrix for sdist test	2024-11-19 12:43:30 -05:00
Wing Lian	a77c8a71cf	fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] (#2080 ) [skip ci]	2024-11-19 10:29:31 -05:00
Wing Lian	c06b8f0243	increase worker count to 8 for basic pytests (#2075 ) [skip ci]	2024-11-18 11:52:35 -05:00
Wing Lian	659ee5d723	don't cancel the tests on main automatically for concurrency (#2055 ) [skip ci]	2024-11-13 17:07:41 -05:00
NanoCode012	28924fc791	feat: cancel ongoing tests if new CI is triggered (#2046 ) [skip ci]	2024-11-13 10:06:59 -05:00
Wing Lian	f68fb71005	update actions version for node16 deprecation (#2037 ) [skip ci] * update actions version for node16 deprecation * update pre-commit/action to use 3.0.1 for actions/cache@v4 dep * update docker/setup-buildx-action too to v3	2024-11-11 15:09:11 -05:00
Wing Lian	3cb2d75de1	upgrade pytorch to 2.5.1 (#2024 )	2024-11-08 10:46:24 -05:00
Wing Lian	052a9a79b4	only run the remainder of the gpu test suite if one case passes first (#2009 ) [skip ci] * only run the remainder of the gpu test suite if one case passes first * also reduce the test matrix	2024-10-31 13:45:01 -04:00
NanoCode012	2501c1a6a3	Fix: Gradient Accumulation issue (#1980 ) * feat: support new arg num_items_in_batch * use kwargs to manage extra unknown kwargs for now * upgrade against upstream transformers main * make sure trl is on latest too * fix for upgraded trl * fix: handle trl and transformer signature change * feat: update trl to handle transformer signature * RewardDataCollatorWithPadding no longer has max_length * handle updated signature for tokenizer vs processor class * invert logic for tokenizer vs processor class * processing_class, not processor class * also handle processing class in dpo * handle model name w model card creation * upgrade transformers and add a loss check test * fix install of tbparse requirements * make sure to add tbparse to req * feat: revert kwarg to positional kwarg to be explicit --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-25 11:28:23 -04:00
Wing Lian	955cca41fc	don't explicitly set cpu pytorch version (#1986 ) use a constraint file use min version of xformers don't install autoawq with pytorch 2.5.0 debugging for errors upgrade pip first fix action yml add back try/except retry w/o constraint use --no-build-isolation show torch version install setuptools and wheel add back try/except	2024-10-21 19:50:50 -04:00
Wing Lian	e12a2130e9	first pass at pytorch 2.5.0 support (#1982 ) * first pass at pytorch 2.5.0 support * attempt to install causal_conv1d with mamba * gracefully handle missing xformers * fix import * fix incorrect version, add 2.5.0 * increase tests timeout	2024-10-21 11:00:45 -04:00
Wing Lian	e8d3da0081	upgrade pytorch from 2.4.0 => 2.4.1 (#1950 ) * upgrade pytorch from 2.4.0 => 2.4.1 * update xformers for updated pytorch version * handle xformers version case for torch==2.3.1	2024-10-09 11:53:56 -04:00
Wing Lian	3c6b9eda2e	run pytests with varied pytorch versions too (#1883 )	2024-08-31 22:49:35 -04:00
Wing Lian	70978467a0	skip no commit to main on ci (#1814 )	2024-08-06 15:25:54 -04:00
Wing Lian	9a63884597	update test and main/nightly builds (#1797 ) * update test and main/nightly builds * don't install mamba-ssm on 2.4.0 since it has no wheels yet	2024-07-30 12:37:40 -04:00
Wing Lian	e1725aef2b	update modal package and don't cache pip install (#1757 ) * update modal package and cleanup pip cache * more verbosity on the test	2024-07-16 14:45:38 -04:00
Wing Lian	1e57b4c562	update to pytorch 2.3.1 (#1746 ) [skip ci]	2024-07-13 13:28:17 -04:00
Wing Lian	a159724e44	bump trl and accelerate for latest releases (#1730 ) * bump trl and accelerate for latest releases * ensure that the CI runs on new gh org * drop kto_pair support since removed upstream	2024-07-10 11:15:44 -04:00
Wing Lian	ef223519c9	update deps (#1663 ) [skip ci] * update deps and tweak logic so axolotl is pip installable * use vcs url format * using dependency_links isn't supported per docs)	2024-05-28 11:23:34 -04:00
Wing Lian	3319780300	update torch 2.2.1 -> 2.2.2 (#1622 )	2024-05-15 09:45:27 -04:00
Wing Lian	8cb127abeb	configure nightly docker builds (#1454 ) [skip ci] * configure nightly docker builds * also test update pytorch in modal ci	2024-03-29 08:25:45 -04:00
Wing Lian	05b398a072	fix some of the edge cases for Jamba (#1452 ) * fix some of the edge cases for Jamba * update requirements for jamba	2024-03-29 02:38:02 -04:00
Wing Lian	7803f0934f	fixes for dpo and orpo template loading (#1424 )	2024-03-20 11:36:24 -04:00
Wing Lian	00018629e7	run tests again on Modal (#1289 ) [skip ci] * run tests again on Modal * make sure to run the full suite of tests on modal * run cicd steps via shell script * run tests in different runs * increase timeout * split tests into steps on modal * increase workflow timeout * retry doing this with only a single script * fix yml launch for modal ci * reorder tests to run on modal * skip dpo tests on modal * run on L4s, A10G takes too long * increase CPU and RAM for modal test * run modal tests on A100s * skip phi test on modal * env not arg in modal dockerfile * upgrade pydantic and fastapi for modal tests * cleanup stray character * use A10s instead of A100 for modal	2024-02-29 14:26:26 -05:00
Wing Lian	6d4bbb877f	deprecate py 3.9 support, set min pytorch version (#1343 ) [skip ci]	2024-02-28 12:58:05 -05:00
Wing Lian	5894f0e57e	make mlflow optional (#1317 ) * make mlflow optional * fix xformers don't patch swiglu if xformers not working fix the check for xformers swiglu * fix install of xformers with extra index url for docker builds * fix docker build arg quoting	2024-02-26 11:41:33 -05:00
NanoCode012	a359579371	deprecate: pytorch 2.0.1 image (#1315 ) [skip ci] * deprecate: pytorch 2.0.1 image * deprecate from main image * Update main.yml * Update tests.yml	2024-02-22 11:39:47 +09:00
Wing Lian	8da1633124	Revert "run PR e2e docker CI tests in Modal" (#1220 ) [skip ci]	2024-01-26 16:50:44 -05:00
Wing Lian	36d053f6f0	run PR e2e docker CI tests in Modal (#1217 ) [skip ci] * wip modal for ci * handle falcon layernorms better * update * rebuild the template each time with the pseudo-ARGS * fix ref * update tests to use modal * cleanup ci script * make sure to install jinja2 also * kickoff the gh action on gh hosted runners and specify num gpus	2024-01-26 16:13:27 -05:00
Wing Lian	1b180034c7	ensure the tests use the same version of torch as the latest base docker images (#1215 ) [skip ci]	2024-01-26 10:38:30 -05:00
Wing Lian	badda3783b	make sure to register the base chatml template even if no system message is provided (#1207 )	2024-01-25 10:38:08 -05:00
Wing Lian	6c19e9302a	add python 3.11 to the matrix for unit tests (#1085 ) [skip ci]	2024-01-10 13:02:01 -05:00
Wing Lian	9032e610b1	use tags again for test image, only run docker e2e after pre-commit checks (#1081 )	2024-01-10 09:04:56 -05:00
Wing Lian	40a6362c92	support for mamba (#915 ) * support for mamba * more mamba fixes * use fork for mamba kwargs fix * grad checkpointing doesn't work * fix extras for mamaba * mamba loss fix * use fp32 and remove verbose logging * mamba fixes * fix collator for mamba * set model_type on training_args * don't save safetensors for mamba * update mamba config to disable safetensor checkpooints, install for tests * no evals for mamba tests * handle save_pretrained * handle unused safetensors arg	2023-12-09 12:10:41 -05:00
Wing Lian	0de1457189	try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867 ) * isolate torch from the requirements.txt * fix typo for removed line ending * pin transformers and accelerate to latest releases * try w auto-gptq==0.5.1 * update README to remove manual peft install * pin xformers to 0.0.22 * bump flash-attn to 2.3.3 * pin flash attn to exact version	2023-11-16 10:42:36 -05:00
Wing Lian	f4868d733c	make sure we also run CI tests when requirements.txt changes (#663 )	2023-10-02 08:43:40 -04:00
Wing Lian	5b0bc48fbc	add mistral e2e tests (#649 ) * mistral e2e tests * make sure to enable flash attention for the e2e tests * use latest transformers full sha * uninstall first	2023-09-29 00:22:40 -04:00
Wing Lian	b6ab8aad62	Mistral flash attn packing (#646 ) * add mistral monkeypatch * add arg for decoder attention masl * fix lint for duplicate code * make sure to update transformers too * tweak install for e2e * move mistral patch to conditional	2023-09-27 18:41:00 -04:00

1 2

67 Commits