Commit Graph

61 Commits

Author SHA1 Message Date
Wing Lian
34b68ddaae curl with apt instead of pip 2025-05-18 15:17:38 -07:00
Wing Lian
9a3d0c919b make sure curl is installed 2025-05-18 15:17:38 -07:00
Wing Lian
bd34d0b861 install for hopper from pre-built wheel 2025-05-18 15:17:38 -07:00
Wing Lian
37220ab90a install pybind11 for fa3 build 2025-05-18 15:17:38 -07:00
Wing Lian
e1b74d710b update docker args to minimums used and use MAX_JOBS already set as arg 2025-05-18 15:17:38 -07:00
Wing Lian
79daf5b934 reduce max jobs for build of fa3 2025-05-18 15:17:38 -07:00
Wing Lian
65c6c98a76 whitespace fix in dockerfile 2025-05-18 15:17:37 -07:00
Wing Lian
4ef2e8293f fix the bash in docker base 2025-05-18 15:17:37 -07:00
Wing Lian
9b0be4f15c fix 12.8 image and add flash-attn v3 hopper base image 2025-05-18 15:17:37 -07:00
Wing Lian
0d691cc2a7 add base docker image with pytorch 2.7.0 and variant for cuda 12.8 (#2551)
* add base docker image with pytorch 2.7.0 and variant for cuda 12.8

* my bash is terrible
2025-04-23 14:59:03 -04:00
NanoCode012
e0e5d9b1d6 feat: add llama4 multimodal (#2499)
* feat: add llama4 multimodal

* feat: add torchvision to base docker

* just use latest torchvision

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-04-07 10:49:29 -04:00
Wing Lian
aae4337f40 add 12.8.1 cuda to the base matrix (#2426)
* add 12.8.1 cuda to the base matrix

* use nightly

* bump deepspeed and set no binary

* deepspeed binary fixes hopefully

* install deepspeed by itself

* multiline fix

* make sure ninja is installed

* try with reversion of packaging/setuptools/wheel install

* use license instead of license-file

* try rolling back packaging and setuptools versions

* comment out license for validation for now

* make sure packaging version is consistent

* more parity across tests and docker images for packaging/setuptools
2025-03-21 10:17:25 -04:00
Wing Lian
38df5a36ea bump HF versions except for trl (#2427) 2025-03-20 10:22:05 -04:00
Wing Lian
1302e31049 Transformers version flexibility and FSDP optimizer patch (#2155)
* allow flexibility in transformers version for FSDP

* more flexibility with dev versions of 4.47.0.dev0

* add patch for fsdp

* fix typo

* correct fn name

* stray character

* fix patch

* reset Trainer too

* also reset Trainer.training_step

* allow tests/patched to run more than one process on e2e runner

* skip tests/patched in e2e for now since it's run in regular pytest
2024-12-08 14:50:40 -05:00
Wing Lian
a4f4a56d77 build causal_conv1d and mamba-ssm into the base image (#2113)
* build causal_conv1d and mamba-ssm into the base image

* also build base images on changes to Dockerfile-base and base workflow yaml
2024-12-02 18:27:46 -05:00
Wing Lian
2d7830fda6 upgrade to flash-attn 2.7.0 (#2048) 2024-11-14 06:59:25 -05:00
Wing Lian
035e9f9dd7 janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build (#2022) 2024-11-07 17:54:29 -05:00
Wing Lian
d4f6a6b103 fix dockerfile and base builder (#1795) [skip-ci] 2024-07-30 08:34:37 -04:00
Wing Lian
6d4bbb877f deprecate py 3.9 support, set min pytorch version (#1343) [skip ci] 2024-02-28 12:58:05 -05:00
Wing Lian
8a49309489 upgrade deepspeed to 0.13.1 for mixtral fixes (#1189) [skip ci]
* upgrade deepspeed to 0.13.1 for mixtral fixes

* move deepspeed-kernels install to setup.py
2024-01-24 14:26:40 -05:00
Wing Lian
f544ab2bed don't compile deepspeed or bitsandbytes from source (#837) 2023-11-08 19:49:55 -05:00
Fabian Preiß
8056ecd30e add deepspeed-kernels dependency for deepspeed>=0.12.0 (#827) 2023-11-05 07:52:56 -05:00
Wing Lian
aca0398315 apex not needed as amp is part of pytorch (#696) 2023-10-07 12:20:45 -04:00
Wing Lian
de87ea68f6 fix multiline for docker (#694) 2023-10-06 22:38:15 -04:00
Maxime
923eb91304 tweak: improve base builder for smaller layers (#500) 2023-09-22 16:17:50 -04:00
Wing Lian
e85d2eb06b let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners (#427) 2023-09-21 20:36:30 -04:00
Wing Lian
b53e77775b update dockerfile to not build evoformer since it fails the build (#607) 2023-09-19 16:28:29 -04:00
mhenrichsen
cf6654769a flash attn pip install (#426)
* flash attn pip

* add packaging

* add packaging to apt get

* install flash attn in dockerfile

* remove unused whls

* add wheel

* clean up pr

fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately

* install wheel for ci

* no flash-attn for basic cicd

* install flash-attn as pip extras

---------

Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net>
Co-authored-by: mhenrichsen <some_email@hey.com>
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-08-18 19:00:27 -04:00
Wing Lian
ffac902c1b bump flash-attn to 2.0.4 for the base docker image (#382) 2023-08-13 17:55:04 -04:00
Wing Lian
2c37bf6c21 Prune cuda117 (#327)
* drop cuda117/torch 1.13.1 from support, pin flash attention to v2.0.1, rm torchvision/torchaudio install

* gptq base build not needed. add sm 9.0 support
2023-07-26 16:27:49 -04:00
Wing Lian
cdf85fdbd5 pin flash attention 2 to the fix for backwards pass 2023-07-21 08:18:53 -04:00
Wing Lian
9b790d359b flash attention 2 2023-07-21 08:17:46 -04:00
Wing Lian
b06d3e3645 explicitly pin flash attention 1 to v1.0.9 2023-07-20 01:02:08 -04:00
Wing Lian
71456955f5 pin pydantic so deepspeed isn't broken 2023-07-02 22:26:51 -04:00
Wing Lian
530809fd74 update pip install command for apex 2023-06-28 22:36:28 -04:00
Wing Lian
c43c5c84ff py310, fix cuda arg in deepspeed 2023-05-30 18:02:34 -04:00
Wing Lian
bbc5bc5791 Merge pull request #108 from OpenAccess-AI-Collective/docker-gptq
default to qlora support, make gptq specific image
2023-05-30 15:07:04 -04:00
NanoCode012
392dfd9b07 Lint and format 2023-05-31 02:53:22 +09:00
Wing Lian
48612f8376 cleanup from pr feedback 2023-05-30 09:56:30 -04:00
Wing Lian
e43bcc6c4f move CUDA_VERSION_BNB arg inside of stage build scope 2023-05-29 13:30:15 -04:00
Wing Lian
00323f0a6f fix CUDA_VERSION_BNB env var 2023-05-29 08:06:22 -04:00
Wing Lian
21f17cca69 bnb fixes 2023-05-29 00:06:35 -04:00
Wing Lian
809ccebb38 use python setup install, bdist wheel is unreliable in installing extension 2023-05-28 15:49:13 -04:00
Wing Lian
a798ba1659 ensure libbitsandbytes*.so gets included with wheel 2023-05-28 12:28:37 -04:00
Wing Lian
cf37980395 fix missing run coninuation 2023-05-27 15:28:54 -04:00
Wing Lian
312b8d51d6 update docker to compile latest bnb to properly support qlora 2023-05-27 12:36:53 -04:00
Wing Lian
c3d256271e fix wheel install glob 2023-05-26 10:37:02 -04:00
Wing Lian
1fc9b44e3d fix wheel blobs in dockerfile 2023-05-26 07:40:11 -04:00
Wing Lian
259262bf42 fix xentropy wheel name typo 2023-05-25 17:25:38 -04:00
Wing Lian
8d6a28953f fix relative path in flash-attn build: 2023-05-25 12:18:28 -04:00