Wing Lian
8cd75cff9f
use cuda 12.9.1 and add python 3.12 to base images ( #3367 )
2026-01-21 13:34:14 -05:00
Wing Lian
8ab9d9ea88
Version dev ( #3365 )
2026-01-20 22:58:29 -05:00
Wing Lian
8f25124269
upgrade transformers to 4.57.5 ( #3358 )
...
* upgrade transformers to 4.57.5
* explicitly set versions for fbgemm-gpu
* handle index url for cuda version
* explicitly set cu version for fbgemm deps, skip for 130
* cu suffix not needed on version if using whl subpath
2026-01-16 11:17:43 -05:00
Wing Lian
6331e4a130
fix amd64 and set 2.9.1 as latest cloud image ( #3356 )
2026-01-14 11:56:36 -05:00
salman
1410e4474e
update PR template ( #3349 ) [skip ci]
2026-01-14 09:39:21 -05:00
Wing Lian
dc77b5bf42
fix arm64 builds ( #3355 )
...
* fix syntax for secrets in gha yaml
* setup env for uv too
* arm64 for base uv too
* don't build causal-conv1d or mamba for arm64 and use arm64 wheels
* fix dockerfile syntax
* fix shell syntax
2026-01-14 09:38:48 -05:00
@TT
3e0bbd33ec
feat: add ARM64/AArch64 build support to Dockerfile-base ( #3346 )
...
* Add support for capability to build arm64 image
* Fixing wrong variable TARGETPLATFORM bug
* Adding missing semicolons
* skip docker hub login if PR (no push) or no credentials
* Enabling arm64 builds for Dockerfile-base in Github actions
* TARGETARCH automatically default to platform arch under build
* Enabling arm64 builds for axolotl docker builds
* Enabling arm64 builds for axolotl-cloud docker build Github actions
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-12 12:00:02 -05:00
Wing Lian
ee59e4de97
add cu130 + torch 2.9.1 to test matrices ( #3343 )
...
* add cu130 + torch 2.9.1 to test matrices
* uv can't use pip3 directly
2026-01-05 15:24:29 -05:00
Wing Lian
4e61b8aa23
use updated version of prebuilt wheels for flash attention for cu130 ( #3342 )
...
* use updated version of prebuilt wheels for flash attention for cu130
* use elif
* fix the uv base installs of FA also
* make wget less verbose
2026-01-05 13:48:12 -05:00
Wing Lian
b26ba3a5cb
don't build images w cuda 130 since we don't have flash attention wheels ( #3341 )
2026-01-03 18:08:28 -05:00
Wing Lian
afe18ace35
deprecate torch 2.7.1 ( #3339 )
2026-01-01 06:52:45 -05:00
Wing Lian
e73dab6df9
support pydantic 2.12 ( #3328 )
...
* upgrade pydantic to 2.12
* use latest modal version
* upgrade modal
* update modal in requirements and loosen pydantic
* upgrade modal too
2025-12-30 12:41:07 -05:00
Wing Lian
11c0b5b256
bartch upgrade dependencies ( #3299 )
...
* upgrade dependencies
* don't use reset sessions
* downgrade transformers, upgrade other deps
* upgrade bnb to 0.49.0
* restore s3 cache
* explicit use local files w hub
* decompress and strip top level dir
* use 2 levels for strip components
* try to preserve permissions for symlinks
* use updated tar
* fix #3293 for distributed
* downgrade bnb
* fast fail after 4
* fix total tokens device
* patch accelerate CP/SP (#3309 )
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-30 09:02:49 -05:00
Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
Wing Lian
07c41a6c2a
fix preview docs failing due to running out of disk ( #3326 ) [skip ci]
...
* fix preview docs failing due to running out of disk
* fix docs publish too
2025-12-19 11:34:55 -05:00
Wing Lian
2a664dc8ad
support for xformers wheels for torch 2.9 ( #3308 )
...
* support for xformers wheels for torch 2.9
* fix hf cache?
* don't use hf cache from s3
* show disk free space in ci
2025-12-11 11:56:40 -05:00
Wing Lian
0b635e69c5
build docker images for 2.9.x ( #3273 )
2025-11-20 09:26:24 -05:00
Wing Lian
0d27e14e45
Torch 2.9.1 base images ( #3268 )
...
* update torch 2.9.1 base images
* update base dockerfile image check
2025-11-20 09:04:37 -05:00
Wing Lian
a6bafb55cb
upgrade datasets to 4.4.1 ( #3266 )
...
* upgrade datasets
* cleanup pip cache earlier
* cleanup unused things from worker
* also cleanup sdist
2025-11-14 09:52:14 -08:00
Wing Lian
0fbde69e9c
only push axolotl images, personal repo is deprecated ( #3262 )
...
* only push axolotl images, personal repo is deprecated
* cleanup
2025-11-14 07:50:03 -08:00
Wing Lian
301e22849f
upgrade to latest deepspeed and make sure latest tagged axolotl images are using torch 2.8.0 ( #3261 )
2025-11-13 13:03:01 -05:00
salman
c37decb073
update pre-commit cadence ( #3245 )
2025-11-04 13:43:40 +00:00
Wing Lian
633afffacb
add torch 2.9.0 to ci ( #3223 )
2025-10-30 18:50:26 -04:00
Wing Lian
4b1b4fa6d8
upgrade numpy ( #3236 )
...
* upgrade numpy to 2.3.4
* bump contribs for numpy
* fix vllm versions
* bump numba
* make sure psutil is installed
* add psutil to cicd dockerfile jinja
* lower dep versions of numba + numpy for vllm
* bump datasets version
* resolve pydantic conflict too
2025-10-30 10:03:24 -04:00
Wing Lian
a4b921135b
build cuda 13.0.0 base image with 2.9.0 ( #3229 )
...
* build cuda 13.0.0 base image with 2.9.0
* upgrade causal-conv1d
* 1.5.4 not in pypi yet
* pin to 1.3.0
* use github release instead of pypi
* split the logic for incompatible packages
* fix bash in dockerfile
2025-10-29 18:07:29 -04:00
Wing Lian
383f220cfd
build torch 2.9.0 base images ( #3221 )
2025-10-20 08:53:49 -04:00
Wing Lian
409cfb8a87
deprecate torch 2.6.0 support ( #3197 ) [skip ci]
2025-10-07 11:23:41 -04:00
Wing Lian
ce74c20109
don't cache pip install ( #3194 )
...
* don't cache pip install
* no cache dir for disk space for sdist too
2025-10-01 11:11:39 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
Wing Lian
06bebcb65f
run cu128-2.8.0 e2e tests on B200 ( #3126 )
...
* run cu128-2.8.0 e2e tests on B200
* not an int 🤦
* fix yaml
2025-09-02 13:13:23 -04:00
Wing Lian
6afba3871d
Add support for PyTorch 2.8.0 ( #3106 )
...
* Add support for PyTorch 2.8.0
* loosen triton requirements
* handle torch 2.8.0 in setup.py
* fix versions
* no vllm for torch 2.8.0
* remove comment
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-28 09:10:40 -04:00
salman
d1de6f5f3d
Add option to skip slow tests in PRs ( #3060 ) [skip ci]
...
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* testing e2e skip [skip-e2e]
* stop running multigpu [skip-e2e]
* should work now [skip-e2e]
* reverting [skip-e2e]
* testing [skip-e2e]
* debug [skip-e2e]
* debug [skip-e2e]
* round 2[skip-e2e]
* removing debug [skip-e2e]
* support skipping whole PR [skip-e2e]
* use script for e2e skip [skip-e2e]
* contributing [skip-e2e]
* contributing [skip-e2e]
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-08-13 22:57:51 -04:00
Wing Lian
686933194e
fix vllm tagging and add cloud images w/o tmux ( #3049 ) [skip ci]
2025-08-10 20:21:56 -04:00
Wing Lian
05f1b4b2e8
run monkeypatch tests in seperate runner ( #3047 )
2025-08-09 14:34:07 -04:00
Wing Lian
c5e5aba547
Add 2.8.0 base images and uv images ( #3034 )
2025-08-08 02:30:16 -04:00
Wing Lian
10946afae7
fixes for spinning up vllm service for grpo ( #3001 )
2025-08-02 11:19:24 -04:00
salman
09dda462ab
Fix don't preview docs for contributors ( #2994 ) [skip ci]
...
* checking against fork vs. main repo
* force doc preview
2025-07-31 11:12:41 -04:00
Wing Lian
1d2aa1e467
upgrade to support latest transformers release ( #2984 )
...
* upgrade to support latest transformers release
* bump mistral common too
* Fix dependencies
2025-07-27 17:05:12 -04:00
Wing Lian
add3e5076b
don't publish to netlify on contributor submissions since it requires auth tokens ( #2985 ) [skip ci]
...
* don't publish to netlify on contributor submissions since it requires auth tokens
* fix no-tmux build and add contact to motd
2025-07-27 17:04:27 -04:00
salman
1407aac779
Skip CI for draft PRs ( #2970 )
2025-07-24 09:11:46 +01:00
Wing Lian
d32058e149
include torchvision in build for upstream changes requiring it now ( #2953 ) [skip ci]
2025-07-22 04:19:16 -04:00
Wing Lian
8a4bcacdb2
cu126-torch271 for cloud docker image should be tagged with main-latest ( #2935 )
2025-07-17 00:01:23 -04:00
Wing Lian
d2c3d5a954
run nightly-vs-upstream-main on 2.7.1 and multi-gpu also ( #2929 ) [skip ci]
2025-07-16 21:45:42 -04:00
Wing Lian
942005f526
use modal==1.0.2 for nightlies and for cli ( #2925 ) [skip ci]
...
* use modal==1.0.2 for nightlies and for cli
* use latest cce fork for upstream changes
* increase timeout
2025-07-15 20:31:23 -04:00
Wing Lian
7dc3ac6cb3
update nightlies builds ( #2921 ) [skip ci]
2025-07-14 20:10:43 -04:00
Wing Lian
5081db7f8a
upgrade trl==0.19.1 ( #2892 ) [skip ci]
...
* upgrade trl==0.19.1
* add vllm for tests for grpo
* fixes to work with latest trl
* need data_parallel_size config too
* support for vllm_mode for server / colocate
* vllm settings for colocate
* relax vllm version
* bump min hf hub for latest vllm support
* add hints on string literal for vllm mode
* use latest transformers 4.53.2
* tweak acceptable loss on flaky test_ds_zero3_packed test
* don't run flaky vllm/grpo tests for now
2025-07-14 09:23:42 -04:00
salman
03b2a113fe
Update doc preview workflow to use sticky comments ( #2873 )
2025-07-11 14:08:35 +01:00
Wing Lian
c6d69d5c1b
release v0.11.0 ( #2875 )
...
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
* release v0.11.0
* don't build vllm into release for now
* remove 2.5.1 references
* smollm3 multipack support
* fix ordering of e2e tests
2025-07-09 09:22:35 -04:00
Wing Lian
4ff96a2526
fix xformers version ( #2888 )
2025-07-09 08:43:40 -04:00
salman
89e99eaaa7
slowest durations ( #2887 ) [skip ci]
2025-07-09 08:43:26 -04:00