Commit Graph

128 Commits

Author SHA1 Message Date
Wing Lian
e8d3da0081 upgrade pytorch from 2.4.0 => 2.4.1 (#1950)
* upgrade pytorch from 2.4.0 => 2.4.1

* update xformers for updated pytorch version

* handle xformers version case for torch==2.3.1
2024-10-09 11:53:56 -04:00
Wing Lian
4ca0a47cfb add 2.4.1 to base models (#1953) 2024-10-09 08:43:11 -04:00
Wing Lian
3853ab7ae9 bump accelerate to 0.34.2 (#1901)
* bump accelerate

* add fixture to predownload the test model

* change fixture
2024-09-07 14:39:31 -04:00
Wing Lian
93b769a979 lint fix and update gha regex (#1899) 2024-09-05 09:58:21 -04:00
Wing Lian
3c6b9eda2e run pytests with varied pytorch versions too (#1883) 2024-08-31 22:49:35 -04:00
Wing Lian
e8ff5d5738 don't mess with bnb since it needs compiled wheels (#1859) 2024-08-23 12:18:47 -04:00
Wing Lian
b33dc07a77 rename nightly test and add badge (#1853) 2024-08-22 13:13:33 -04:00
Wing Lian
dcbff16983 run nightly ci builds against upstream main (#1851)
* run nightly ci builds against upstream main

* add test badges

* run the multigpu tests against nightly main builds too
2024-08-22 13:10:54 -04:00
Wing Lian
54392ac8a6 Attempt to run multigpu in PR CI for now to ensure it works (#1815) [skip ci]
* Attempt to run multigpu in PR CI for now to ensure it works

* fix yaml file

* forgot to include multigpu tests

* fix call to cicd.multigpu

* dump dictdefault to dict for yaml conversion

* use to_dict instead of casting

* 16bit-lora w flash attention, 8bit lora seems problematic

* add llama fsdp test

* more tests

* Add test for qlora + fsdp with prequant

* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test

* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
70978467a0 skip no commit to main on ci (#1814) 2024-08-06 15:25:54 -04:00
Wing Lian
dbf8fb549e publish axolotl images without extras in the tag name (#1798) 2024-07-30 13:36:19 -04:00
Wing Lian
9a63884597 update test and main/nightly builds (#1797)
* update test and main/nightly builds

* don't install mamba-ssm on 2.4.0 since it has no wheels yet
2024-07-30 12:37:40 -04:00
Wing Lian
c5587b45ac use 12.4.1 instead of 12.4 [skip-ci] (#1796) 2024-07-30 08:50:23 -04:00
Wing Lian
d4f6a6b103 fix dockerfile and base builder (#1795) [skip-ci] 2024-07-30 08:34:37 -04:00
Wing Lian
d8d1788ffc move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 (#1793) 2024-07-30 08:06:11 -04:00
Wing Lian
e1725aef2b update modal package and don't cache pip install (#1757)
* update modal package and cleanup pip cache

* more verbosity on the test
2024-07-16 14:45:38 -04:00
Wing Lian
1e57b4c562 update to pytorch 2.3.1 (#1746) [skip ci] 2024-07-13 13:28:17 -04:00
Wing Lian
137d84d1b4 add torch 2.3.1 base image (#1745) 2024-07-13 09:41:51 -04:00
Wing Lian
a159724e44 bump trl and accelerate for latest releases (#1730)
* bump trl and accelerate for latest releases

* ensure that the CI runs on new gh org

* drop kto_pair support since removed upstream
2024-07-10 11:15:44 -04:00
Wing Lian
ef223519c9 update deps (#1663) [skip ci]
* update deps and tweak logic so axolotl is pip installable

* use vcs url format

* using dependency_links isn't supported per docs)
2024-05-28 11:23:34 -04:00
Wing Lian
60113437e4 cloud image w/o tmux (#1628) 2024-05-15 22:27:40 -04:00
Wing Lian
3319780300 update torch 2.2.1 -> 2.2.2 (#1622) 2024-05-15 09:45:27 -04:00
Wing Lian
70185763f6 add torch 2.3.0 to builds (#1593) 2024-05-05 18:45:45 -04:00
Wing Lian
c10563c444 fix broken linting (#1541)
* chore: lint

* include examples in yaml check

* mistral decided to gate their models...

* more mistral models that were gated
2024-04-19 01:03:04 -04:00
Wing Lian
4a92a3b9ee Nightlies fix v4 (#1458) [skip ci]
* another attempt at github actions

* try again
2024-03-29 11:04:34 -04:00
Wing Lian
46a73e3d1a fix yaml parsing for workflow (#1457) [skip ci] 2024-03-29 10:21:08 -04:00
Wing Lian
da3415bb5a fix how nightly tag is generated (#1456) [skip ci] 2024-03-29 09:29:17 -04:00
Wing Lian
8cb127abeb configure nightly docker builds (#1454) [skip ci]
* configure nightly docker builds

* also test update pytorch in modal ci
2024-03-29 08:25:45 -04:00
Wing Lian
05b398a072 fix some of the edge cases for Jamba (#1452)
* fix some of the edge cases for Jamba

* update requirements for jamba
2024-03-29 02:38:02 -04:00
Wing Lian
da265dd796 fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support (#1413) 2024-03-26 16:46:19 -04:00
Hamel Husain
4e69aa48ab Update docs.yml 2024-03-21 22:36:57 -07:00
Hamel Husain
629450cecd Bootstrap Hosted Axolotl Docs w/Quarto (#1429)
* precommit

* mv styes.css

* fix links
2024-03-21 22:28:36 -07:00
Wing Lian
7803f0934f fixes for dpo and orpo template loading (#1424) 2024-03-20 11:36:24 -04:00
Wing Lian
00018629e7 run tests again on Modal (#1289) [skip ci]
* run tests again on Modal

* make sure to run the full suite of tests on modal

* run cicd steps via shell script

* run tests in different runs

* increase timeout

* split tests into steps on modal

* increase workflow timeout

* retry doing this with only a single script

* fix yml launch for modal ci

* reorder tests to run on modal

* skip dpo tests on modal

* run on L4s, A10G takes too long

* increase CPU and RAM for modal test

* run modal tests on A100s

* skip phi test on modal

* env not arg in modal dockerfile

* upgrade pydantic and fastapi for modal tests

* cleanup stray character

* use A10s instead of A100 for modal
2024-02-29 14:26:26 -05:00
Wing Lian
6d4bbb877f deprecate py 3.9 support, set min pytorch version (#1343) [skip ci] 2024-02-28 12:58:05 -05:00
Wing Lian
5894f0e57e make mlflow optional (#1317)
* make mlflow optional

* fix xformers

don't patch swiglu if xformers not working
fix the check for xformers swiglu

* fix install of xformers with extra index url for docker builds

* fix docker build arg quoting
2024-02-26 11:41:33 -05:00
NanoCode012
a359579371 deprecate: pytorch 2.0.1 image (#1315) [skip ci]
* deprecate: pytorch 2.0.1 image

* deprecate from main image

* Update main.yml

* Update tests.yml
2024-02-22 11:39:47 +09:00
Wing Lian
ea00dd0852 don't use load and push together (#1284) 2024-02-09 14:54:31 -05:00
Wing Lian
aaf54dc730 run the docker image builds and push on gh action gpu runners (#1218) 2024-02-09 10:32:54 -05:00
Wing Lian
8da1633124 Revert "run PR e2e docker CI tests in Modal" (#1220) [skip ci] 2024-01-26 16:50:44 -05:00
Wing Lian
36d053f6f0 run PR e2e docker CI tests in Modal (#1217) [skip ci]
* wip modal for ci

* handle falcon layernorms better

* update

* rebuild the template each time with the pseudo-ARGS

* fix ref

* update tests to use modal

* cleanup ci script

* make sure to install jinja2 also

* kickoff the gh action on gh hosted runners and specify num gpus
2024-01-26 16:13:27 -05:00
Wing Lian
1b180034c7 ensure the tests use the same version of torch as the latest base docker images (#1215) [skip ci] 2024-01-26 10:38:30 -05:00
Wing Lian
74c72ca5eb drop py39 docker images, add py311, upgrade pytorch to 2.1.2 (#1205)
* drop py39 docker images, add py311, upgrade pytorch to 2.1.2

* also allow the main build to be manually triggered

* fix workflow_dispatch in yaml
2024-01-26 00:38:49 -05:00
Wing Lian
badda3783b make sure to register the base chatml template even if no system message is provided (#1207) 2024-01-25 10:38:08 -05:00
Wing Lian
0f77b8d798 add commit message option to skip docker image builds in ci (#1168) [skip ci] 2024-01-22 19:55:36 -05:00
Wing Lian
ece0211996 Agnostic cloud gpu docker image and Jupyter lab (#1097) 2024-01-15 22:37:54 -05:00
Hamel Husain
2dc431078c Add link on README to Docker Debugging (#1107)
* add docker debug

* Update docs/debugging.md

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* explain editable install

* explain editable install

* upload new video

* add link to README

* Update README.md

* Update README.md

* chore: lint

* make sure to lint markdown too

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-12 08:51:35 -05:00
Mark Saroufim
44ba616da2 Fix broken pypi.yml (#1099) [skip ci] 2024-01-11 12:35:31 -05:00
Wing Lian
6c19e9302a add python 3.11 to the matrix for unit tests (#1085) [skip ci] 2024-01-10 13:02:01 -05:00
Wing Lian
9032e610b1 use tags again for test image, only run docker e2e after pre-commit checks (#1081) 2024-01-10 09:04:56 -05:00