Wing Lian
e9c3a2aec0
add missing dunder-init for monkeypatches and add tests for install from sdist ( #2085 )
...
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 121, 12.1.1, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
* add missing dunder-init for monkeypatches and add tests for install from sdist
* fix gha name
* reduce matrix for sdist test
2024-11-19 12:43:30 -05:00
Wing Lian
5f6f9186e4
make sure action has permission to create release ( #2083 ) [skip ci]
2024-11-19 10:43:02 -05:00
Wing Lian
a77c8a71cf
fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] ( #2080 ) [skip ci]
2024-11-19 10:29:31 -05:00
Wing Lian
8403c67156
don't build bdist ( #2076 ) [skip ci]
2024-11-18 12:36:03 -05:00
Wing Lian
c06b8f0243
increase worker count to 8 for basic pytests ( #2075 ) [skip ci]
2024-11-18 11:52:35 -05:00
Wing Lian
2f20cb7ebf
upgrade datasets==3.1.0 and add upstream check ( #2067 ) [skip ci]
2024-11-15 19:08:38 -05:00
Wing Lian
71d4030b79
gradient accumulation tests, embeddings w pad_token fix, smaller models ( #2059 )
...
* add more test cases for gradient accumulation and fix zero3
* swap out for smaller model
* fix missing return
* fix missing pad_token in config
* support concurrency for multigpu testing
* cast empty deepspeed to empty string for zero3 check
* fix temp_dir as fixture so parametrize works properly
* fix test file for multigpu evals
* don't use default
* don't use default for fsdp_state_dict_type
* don't use llama tokenizer w smollm
* also automatically cancel multigpu for concurrency
2024-11-14 12:59:00 -05:00
Wing Lian
ba219b51a5
fix duplicate base build ( #2061 ) [skip ci]
2024-11-14 10:31:19 -05:00
Wing Lian
5be8e13d35
make sure to add tags for versioned tag on cloud docker images ( #2060 )
2024-11-14 10:24:49 -05:00
Wing Lian
659ee5d723
don't cancel the tests on main automatically for concurrency ( #2055 ) [skip ci]
2024-11-13 17:07:41 -05:00
Wing Lian
c5eb9ea2c2
fix push to main and tag semver build for docker ci ( #2054 )
2024-11-13 14:04:28 -05:00
Wing Lian
01881c3113
make sure to tag images in docker for tagged releases ( #2051 ) [skip ci]
...
* make sure to tag images in docker for tagged releases
* fix tag event
2024-11-13 13:15:49 -05:00
Wing Lian
0e8eb96e07
run pypi release action on tag create w version ( #2047 )
2024-11-13 10:21:48 -05:00
NanoCode012
28924fc791
feat: cancel ongoing tests if new CI is triggered ( #2046 ) [skip ci]
2024-11-13 10:06:59 -05:00
Wing Lian
f68fb71005
update actions version for node16 deprecation ( #2037 ) [skip ci]
...
* update actions version for node16 deprecation
* update pre-commit/action to use 3.0.1 for actions/cache@v4 dep
* update docker/setup-buildx-action too to v3
2024-11-11 15:09:11 -05:00
Wing Lian
9bc3ee6c75
add axolotlai docker hub org to publish list ( #2031 )
...
* add axolotlai docker hub org to publish list
* fix to use latest actions docker metadata version
* fix list in yaml for expected format for action
* missed a change
2024-11-11 09:48:19 -05:00
Wing Lian
e20b15bee3
make publish to pypi manually dispatchable as a workflow ( #2026 ) [skip ci]
2024-11-08 14:18:16 -05:00
Wing Lian
3cb2d75de1
upgrade pytorch to 2.5.1 ( #2024 )
2024-11-08 10:46:24 -05:00
Wing Lian
052a9a79b4
only run the remainder of the gpu test suite if one case passes first ( #2009 ) [skip ci]
...
* only run the remainder of the gpu test suite if one case passes first
* also reduce the test matrix
2024-10-31 13:45:01 -04:00
Wing Lian
3591bcfaf9
add torch 2.5.1 for base image ( #2010 )
2024-10-31 13:27:49 -04:00
NanoCode012
2501c1a6a3
Fix: Gradient Accumulation issue ( #1980 )
...
* feat: support new arg num_items_in_batch
* use kwargs to manage extra unknown kwargs for now
* upgrade against upstream transformers main
* make sure trl is on latest too
* fix for upgraded trl
* fix: handle trl and transformer signature change
* feat: update trl to handle transformer signature
* RewardDataCollatorWithPadding no longer has max_length
* handle updated signature for tokenizer vs processor class
* invert logic for tokenizer vs processor class
* processing_class, not processor class
* also handle processing class in dpo
* handle model name w model card creation
* upgrade transformers and add a loss check test
* fix install of tbparse requirements
* make sure to add tbparse to req
* feat: revert kwarg to positional kwarg to be explicit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-10-25 11:28:23 -04:00
Wing Lian
718cfb2dd1
revert image tagged as main-latest ( #1990 )
2024-10-22 13:54:24 -04:00
Wing Lian
5c629ee444
use torch 2.4.1 images as latest now that torch 2.5.0 is out ( #1987 )
2024-10-21 19:51:06 -04:00
Wing Lian
955cca41fc
don't explicitly set cpu pytorch version ( #1986 )
...
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
2024-10-21 19:50:50 -04:00
Wing Lian
e12a2130e9
first pass at pytorch 2.5.0 support ( #1982 )
...
* first pass at pytorch 2.5.0 support
* attempt to install causal_conv1d with mamba
* gracefully handle missing xformers
* fix import
* fix incorrect version, add 2.5.0
* increase tests timeout
2024-10-21 11:00:45 -04:00
Wing Lian
67f744dc8c
add pytorch 2.5.0 base images ( #1979 )
...
* add pytorch 2.5.0 base images
* make sure num examples for debug is zero and fix comparison
2024-10-18 03:36:51 -04:00
Wing Lian
e8d3da0081
upgrade pytorch from 2.4.0 => 2.4.1 ( #1950 )
...
* upgrade pytorch from 2.4.0 => 2.4.1
* update xformers for updated pytorch version
* handle xformers version case for torch==2.3.1
2024-10-09 11:53:56 -04:00
Wing Lian
4ca0a47cfb
add 2.4.1 to base models ( #1953 )
2024-10-09 08:43:11 -04:00
Wing Lian
3853ab7ae9
bump accelerate to 0.34.2 ( #1901 )
...
* bump accelerate
* add fixture to predownload the test model
* change fixture
2024-09-07 14:39:31 -04:00
Wing Lian
93b769a979
lint fix and update gha regex ( #1899 )
2024-09-05 09:58:21 -04:00
Wing Lian
3c6b9eda2e
run pytests with varied pytorch versions too ( #1883 )
2024-08-31 22:49:35 -04:00
Wing Lian
e8ff5d5738
don't mess with bnb since it needs compiled wheels ( #1859 )
2024-08-23 12:18:47 -04:00
Wing Lian
b33dc07a77
rename nightly test and add badge ( #1853 )
2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983
run nightly ci builds against upstream main ( #1851 )
...
* run nightly ci builds against upstream main
* add test badges
* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
54392ac8a6
Attempt to run multigpu in PR CI for now to ensure it works ( #1815 ) [skip ci]
...
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
70978467a0
skip no commit to main on ci ( #1814 )
2024-08-06 15:25:54 -04:00
Wing Lian
dbf8fb549e
publish axolotl images without extras in the tag name ( #1798 )
2024-07-30 13:36:19 -04:00
Wing Lian
9a63884597
update test and main/nightly builds ( #1797 )
...
* update test and main/nightly builds
* don't install mamba-ssm on 2.4.0 since it has no wheels yet
2024-07-30 12:37:40 -04:00
Wing Lian
c5587b45ac
use 12.4.1 instead of 12.4 [skip-ci] ( #1796 )
2024-07-30 08:50:23 -04:00
Wing Lian
d4f6a6b103
fix dockerfile and base builder ( #1795 ) [skip-ci]
2024-07-30 08:34:37 -04:00
Wing Lian
d8d1788ffc
move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 ( #1793 )
2024-07-30 08:06:11 -04:00
Wing Lian
e1725aef2b
update modal package and don't cache pip install ( #1757 )
...
* update modal package and cleanup pip cache
* more verbosity on the test
2024-07-16 14:45:38 -04:00
Wing Lian
1e57b4c562
update to pytorch 2.3.1 ( #1746 ) [skip ci]
2024-07-13 13:28:17 -04:00
Wing Lian
137d84d1b4
add torch 2.3.1 base image ( #1745 )
2024-07-13 09:41:51 -04:00
mhenrichsen
1194c2e0b1
github urls ( #1734 )
...
Co-authored-by: Henrichsen, Mads (ext) <mads.henrichsen.ext@siemens-energy.com >
2024-07-11 09:19:29 -04:00
Wing Lian
a159724e44
bump trl and accelerate for latest releases ( #1730 )
...
* bump trl and accelerate for latest releases
* ensure that the CI runs on new gh org
* drop kto_pair support since removed upstream
2024-07-10 11:15:44 -04:00
Wing Lian
ef223519c9
update deps ( #1663 ) [skip ci]
...
* update deps and tweak logic so axolotl is pip installable
* use vcs url format
* using dependency_links isn't supported per docs)
2024-05-28 11:23:34 -04:00
Wing Lian
60113437e4
cloud image w/o tmux ( #1628 )
2024-05-15 22:27:40 -04:00
Wing Lian
3319780300
update torch 2.2.1 -> 2.2.2 ( #1622 )
2024-05-15 09:45:27 -04:00
Wing Lian
70185763f6
add torch 2.3.0 to builds ( #1593 )
2024-05-05 18:45:45 -04:00