axolotl

Author	SHA1	Message	Date
Wing Lian	cb811f8bf1	upgrade to flash-attn 2.8.0.post2 (#2828 ) * upgrade to flash-attn 2.8.0.post2 * use cu126 with torch 2.6 * seems vllm 0.8.5.post1 not compatible with cuda12.6.3 and torch 2.6 * cu126 + torch 2.6 as the default * use cu126 for multigpu w torch 2.6 too * drop vllm for now from ci for now	2025-06-29 22:11:16 -04:00
Wing Lian	bcc108efc1	build 2.7.1 images too (#2784 ) [skip ci]	2025-06-12 13:22:20 -04:00
Wing Lian	8f8a7afb05	Add ci and images for CUDA 12.8 for B200s (#2683 ) [skip ci] * Add ci and images for CUDA 12.8 for B200s * add comments explaining CI [skip e2e]	2025-05-16 13:06:08 -04:00
Wing Lian	6ba5c0ed2c	use latest hf-xet and don't install vllm for torch 2.7.0 (#2603 ) * use latest hf-xet and don't install vllm for torch 2.7.0 * fix runpod hub tests	2025-04-30 18:27:39 -04:00
Wing Lian	89ca14d9a0	ensure we pass axolotl extras to the Dockerfile so vllm is included in shipped images (#2599 )	2025-04-30 11:35:45 -04:00
Wing Lian	fedbcc0254	remove torch 2.4.1 CI as part of support deprecation (#2582 )	2025-04-29 08:28:32 -04:00
Wing Lian	dc4da4a7e2	update trl to 0.17.0 (#2560 ) * update trl to 0.17.0 * grpo + vllm no longer supported with 2.5.1 due to vllm constraints * disable VLLM_USE_V1 for ci * imporve handle killing off of multiprocessing vllm service * debug why this doesn't run in CI * increase vllm wait time * increase timeout to 5min * upgrade to vllm 0.8.4 * dump out the vllm log for debugging * use debug logging * increase vllm start timeout * use NVL instead * disable torch compile cache * revert some commented checks now that grpo tests are fixed * increase vllm timeoout back to 5min	2025-04-27 19:19:53 -04:00
Wing Lian	a4d5112ae1	builds for torch 2.7.0 (#2552 ) * builds for torch==2.7.0 * use xformers==0.0.29.post3 * no vllm support with torch 2.7 * update default, fix conditional * no xformers for 270 * no vllm on 2.7.0 for multigpu test too * remove deprecated verbose arg from scheduler * 2.7.0 tests on cpu	2025-04-24 00:39:31 -04:00
NanoCode012	682a9cf79b	Fix: add delinearization and make qlora work with fsdp2 (#2515 ) * fixes for delinearization, and make qlora work with fsdp2 * Add back mistakenly removed lm_eval * typo [skip ci] * patch evals for torch.compile + fsdp2 * also check torch_compile w fsdp2 * lots of fixes for flex attn with llama4 * fix patch check and patch llama4 too * attempt to make the patches stick * use transformers 4.51.2 * update configs and README for llama4 * remove torch.compile for CI test * cleanup any existing singletons * set singleton cache to None instead of deleting * use importlib reload with monkeypatch * don't worry about transformers version, mark inputs with grads, fix regex * make sure embeds aren't on cpu * logging and mem improvements * vllm version and add to docker, make sure to save processor on conversion * fix ambiguous tensor bool check * fix vllm to not use v1, upgrade hf transformers * fix tests * make flex_attn_compile_kwargs configurable, since this depends on model params --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-04-15 23:31:39 -07:00
Wing Lian	e0aba74dd0	Release update 20250331 (#2460 ) [skip ci] * make torch 2.6.0 the default image * fix tests against upstream main * fix attribute access * use fixture dataset * fix dataset load * correct the fixtures + tests * more fixtures * add accidentally removed shakespeare fixture * fix conversion from unittest to pytest class * nightly main ci caches * build 12.6.3 cuda base image * override for fix from huggingface/transformers#37162 * address PR feedback	2025-04-01 08:47:50 -04:00
Wing Lian	04f6324833	build cloud images with torch 2.6.0 (#2413 ) * build cloud images with torch 2.6.0 * nightlies too	2025-03-13 23:28:51 -04:00
Wing Lian	ffae8d6a95	GRPO (#2307 )	2025-02-13 16:01:01 -05:00
NanoCode012	5bbad5ef93	feat: add torch2.6 to ci (#2311 )	2025-02-07 07:28:54 -05:00
Wing Lian	1063d82b51	match the cuda version for 2.4.1 build w/o tmux (#2299 )	2025-01-30 11:46:09 -05:00
salman	c071a530f7	removing 2.3.1 (#2294 )	2025-01-28 23:23:44 -05:00
Wing Lian	d8b4027200	use 2.5.1 docker images as latest tag as it seems stable (#2198 )	2025-01-10 08:35:25 -05:00
Wing Lian	db51a9e4cb	use pep440 instead of semver (#2088 ) [skip ci]	2024-11-19 15:02:10 -05:00
Wing Lian	a77c8a71cf	fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] (#2080 ) [skip ci]	2024-11-19 10:29:31 -05:00
Wing Lian	5be8e13d35	make sure to add tags for versioned tag on cloud docker images (#2060 )	2024-11-14 10:24:49 -05:00
Wing Lian	c5eb9ea2c2	fix push to main and tag semver build for docker ci (#2054 )	2024-11-13 14:04:28 -05:00
Wing Lian	01881c3113	make sure to tag images in docker for tagged releases (#2051 ) [skip ci] * make sure to tag images in docker for tagged releases * fix tag event	2024-11-13 13:15:49 -05:00
Wing Lian	f68fb71005	update actions version for node16 deprecation (#2037 ) [skip ci] * update actions version for node16 deprecation * update pre-commit/action to use 3.0.1 for actions/cache@v4 dep * update docker/setup-buildx-action too to v3	2024-11-11 15:09:11 -05:00
Wing Lian	9bc3ee6c75	add axolotlai docker hub org to publish list (#2031 ) * add axolotlai docker hub org to publish list * fix to use latest actions docker metadata version * fix list in yaml for expected format for action * missed a change	2024-11-11 09:48:19 -05:00
Wing Lian	3cb2d75de1	upgrade pytorch to 2.5.1 (#2024 )	2024-11-08 10:46:24 -05:00
Wing Lian	718cfb2dd1	revert image tagged as main-latest (#1990 )	2024-10-22 13:54:24 -04:00
Wing Lian	5c629ee444	use torch 2.4.1 images as latest now that torch 2.5.0 is out (#1987 )	2024-10-21 19:51:06 -04:00
Wing Lian	e12a2130e9	first pass at pytorch 2.5.0 support (#1982 ) * first pass at pytorch 2.5.0 support * attempt to install causal_conv1d with mamba * gracefully handle missing xformers * fix import * fix incorrect version, add 2.5.0 * increase tests timeout	2024-10-21 11:00:45 -04:00
Wing Lian	e8d3da0081	upgrade pytorch from 2.4.0 => 2.4.1 (#1950 ) * upgrade pytorch from 2.4.0 => 2.4.1 * update xformers for updated pytorch version * handle xformers version case for torch==2.3.1	2024-10-09 11:53:56 -04:00
Wing Lian	dbf8fb549e	publish axolotl images without extras in the tag name (#1798 )	2024-07-30 13:36:19 -04:00
Wing Lian	9a63884597	update test and main/nightly builds (#1797 ) * update test and main/nightly builds * don't install mamba-ssm on 2.4.0 since it has no wheels yet	2024-07-30 12:37:40 -04:00
Wing Lian	1e57b4c562	update to pytorch 2.3.1 (#1746 ) [skip ci]	2024-07-13 13:28:17 -04:00
Wing Lian	a159724e44	bump trl and accelerate for latest releases (#1730 ) * bump trl and accelerate for latest releases * ensure that the CI runs on new gh org * drop kto_pair support since removed upstream	2024-07-10 11:15:44 -04:00
Wing Lian	60113437e4	cloud image w/o tmux (#1628 )	2024-05-15 22:27:40 -04:00
Wing Lian	3319780300	update torch 2.2.1 -> 2.2.2 (#1622 )	2024-05-15 09:45:27 -04:00
Wing Lian	70185763f6	add torch 2.3.0 to builds (#1593 )	2024-05-05 18:45:45 -04:00
Wing Lian	8cb127abeb	configure nightly docker builds (#1454 ) [skip ci] * configure nightly docker builds * also test update pytorch in modal ci	2024-03-29 08:25:45 -04:00
Wing Lian	5894f0e57e	make mlflow optional (#1317 ) * make mlflow optional * fix xformers don't patch swiglu if xformers not working fix the check for xformers swiglu * fix install of xformers with extra index url for docker builds * fix docker build arg quoting	2024-02-26 11:41:33 -05:00
NanoCode012	a359579371	deprecate: pytorch 2.0.1 image (#1315 ) [skip ci] * deprecate: pytorch 2.0.1 image * deprecate from main image * Update main.yml * Update tests.yml	2024-02-22 11:39:47 +09:00
Wing Lian	ea00dd0852	don't use load and push together (#1284 )	2024-02-09 14:54:31 -05:00
Wing Lian	aaf54dc730	run the docker image builds and push on gh action gpu runners (#1218 )	2024-02-09 10:32:54 -05:00
Wing Lian	74c72ca5eb	drop py39 docker images, add py311, upgrade pytorch to 2.1.2 (#1205 ) * drop py39 docker images, add py311, upgrade pytorch to 2.1.2 * also allow the main build to be manually triggered * fix workflow_dispatch in yaml	2024-01-26 00:38:49 -05:00
Wing Lian	0f77b8d798	add commit message option to skip docker image builds in ci (#1168 ) [skip ci]	2024-01-22 19:55:36 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00
Wing Lian	37820f6540	support for cuda 12.1 (#989 )	2023-12-22 11:08:22 -05:00
Hamel Husain	2e61dc3180	Add tests to Docker (#993 )	2023-12-22 06:37:20 -08:00
Hamel Husain	62ba1609b6	bump actions versions	2023-12-21 08:54:08 -08:00
Wing Lian	161bcb6517	Dockerfile torch fix (#987 ) * add torch to requirements.txt at build time to force version to stick * fix xformers check * better handling of xformers based on installed torch version * fix for ci w/o torch	2023-12-21 09:38:20 -05:00
Wing Lian	70157ccb8f	add a latest tag for regular axolotl image, cleanup extraneous print statement (#746 )	2023-10-19 12:28:29 -04:00
Wing Lian	2aa1f71464	fix pytorch 2.1.0 build, add multipack docs (#722 )	2023-10-13 08:57:28 -04:00
Wing Lian	7f2618b5f4	add docker images for pytorch 2.10 (#697 )	2023-10-07 12:23:31 -04:00

1 2

80 Commits