axolotl

Author	SHA1	Message	Date
Hamel Husain	9ca358b671	Simplify Docker Unit Test CI (#1055 ) [skip ci] * Update tests-docker.yml * Update tests-docker.yml * run ci tests on ci yaml updates --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-06 08:20:33 -05:00
JinK	553c80f79a	streaming multipack for pretraining dataset (#959 ) * [Feat] streaming multipack * WIP make continued pretraining work w multipack * fix up hadrcoding, lint * fix dict check * update test for updated pretraining multipack code * fix hardcoded data collator fix for multipack pretraining * fix the collator to be the max length for multipack pretraining * don't bother with latest tag for test * cleanup docker build/test --------- Co-authored-by: jinwonkim93@github.com <jinwonkim> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:13:21 -05:00
Hamel Husain	eb4c99431b	Update tests-docker.yml (#1052 ) [skip ci]	2024-01-05 14:26:18 -05:00
Wing Lian	bcc78d8fa3	bump transformers and update attention class map name (#1023 ) * bump transformers and update attention class map name * also run the tests in docker * add mixtral e2e smoke test * fix base name for docker image in test * mixtral lora doesn't seem to work, at least check qlora * add testcase for mixtral w sample packing * check monkeypatch for flash attn multipack * also run the e2e tests in docker * use all gpus to run tests in docker ci * use privileged mode too for docker w gpus * rename the docker e2e actions for gh ci * set privileged mode for docker and update mixtral model self attn check * use fp16/bf16 for mixtral w fa2 * skip e2e tests on docker w gpus for now * tests to validate mistral and mixtral patches * fix rel import	2024-01-03 12:11:04 -08:00
Wing Lian	37820f6540	support for cuda 12.1 (#989 )	2023-12-22 11:08:22 -05:00
Hamel Husain	2e61dc3180	Add tests to Docker (#993 )	2023-12-22 06:37:20 -08:00
Hamel Husain	62ba1609b6	bump actions versions	2023-12-21 08:54:08 -08:00
Wing Lian	161bcb6517	Dockerfile torch fix (#987 ) * add torch to requirements.txt at build time to force version to stick * fix xformers check * better handling of xformers based on installed torch version * fix for ci w/o torch	2023-12-21 09:38:20 -05:00
Wing Lian	40a6362c92	support for mamba (#915 ) * support for mamba * more mamba fixes * use fork for mamba kwargs fix * grad checkpointing doesn't work * fix extras for mamaba * mamba loss fix * use fp32 and remove verbose logging * mamba fixes * fix collator for mamba * set model_type on training_args * don't save safetensors for mamba * update mamba config to disable safetensor checkpooints, install for tests * no evals for mamba tests * handle save_pretrained * handle unused safetensors arg	2023-12-09 12:10:41 -05:00
Wing Lian	0de1457189	try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867 ) * isolate torch from the requirements.txt * fix typo for removed line ending * pin transformers and accelerate to latest releases * try w auto-gptq==0.5.1 * update README to remove manual peft install * pin xformers to 0.0.22 * bump flash-attn to 2.3.3 * pin flash attn to exact version	2023-11-16 10:42:36 -05:00
Wing Lian	70157ccb8f	add a latest tag for regular axolotl image, cleanup extraneous print statement (#746 )	2023-10-19 12:28:29 -04:00
Wing Lian	2aa1f71464	fix pytorch 2.1.0 build, add multipack docs (#722 )	2023-10-13 08:57:28 -04:00
Wing Lian	7f2618b5f4	add docker images for pytorch 2.10 (#697 )	2023-10-07 12:23:31 -04:00
Wing Lian	f4868d733c	make sure we also run CI tests when requirements.txt changes (#663 )	2023-10-02 08:43:40 -04:00
Wing Lian	5b0bc48fbc	add mistral e2e tests (#649 ) * mistral e2e tests * make sure to enable flash attention for the e2e tests * use latest transformers full sha * uninstall first	2023-09-29 00:22:40 -04:00
Wing Lian	b6ab8aad62	Mistral flash attn packing (#646 ) * add mistral monkeypatch * add arg for decoder attention masl * fix lint for duplicate code * make sure to update transformers too * tweak install for e2e * move mistral patch to conditional	2023-09-27 18:41:00 -04:00
Maxime	5d931cc042	Only run tests when a change to python files is made (#614 ) * Update tests.yml * Update .github/workflows/tests.yml --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-09-20 22:02:04 -04:00
Wing Lian	62a774140b	Fix for check with cfg and merge_lora (#600 )	2023-09-18 21:14:32 -04:00
Wing Lian	1078d3eae7	E2e passing tests (#576 ) * run e2e tests after all other checks have passed * tweak tests so they get run on PRs or push to main * change dependent action for chcecking * one test workflow to rule them all * no need for custom action, just use needs * whoops, python version should be a string * e2e tests can run on any available gpu	2023-09-15 01:03:49 -04:00
Wing Lian	24146733db	E2e device cuda (#575 ) * use torch.cuda.current_device() instead of local_rank * ignore NVML errors for gpu stats * llama lora packing e2e tests	2023-09-14 22:49:27 -04:00
Wing Lian	9218ebecd2	e2e testing (#574 )	2023-09-14 21:56:11 -04:00
Wing Lian	772cd870d4	fix the sed command to replace the version w the tag Some checks failed pre-commit / pre-commit (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details PyTest / test (3.10) (push) Has been cancelled Details PyTest / test (3.9) (push) Has been cancelled Details	2023-09-11 13:44:19 -04:00
Wing Lian	bcbc9597e9	replace tags, build dist for pypi publish (#553 ) * replace tags, build dist for pypi publish * missing trailing comma	2023-09-11 13:25:41 -04:00
Wing Lian	20ed4c1f9e	pypi on tag push (#552 )	2023-09-11 10:33:42 -04:00
Wing Lian	c5dedb17ad	remove with section, doesn't seem to work (#551 )	2023-09-11 10:27:17 -04:00
Wing Lian	b56503d423	publish to pypi workflow on tagged release (#549 )	2023-09-11 09:44:47 -04:00
Wing Lian	34c0a86a11	update readme to point to direct link to runpod template, cleanup install instrucitons (#532 ) * update readme to point to direct link to runpod template, cleanup install instrucitons * default install flash-attn and auto-gptq now too * update readme w flash-attn extra * fix version in setup	2023-09-08 11:58:54 -04:00
Wing Lian	3355706e22	Add support for GPTQ using native transformers/peft (#468 ) * auto gptq support * more tweaks and add yml * remove old gptq docker * don't need explicit peft install for tests * fix setup.py to use extra index url install torch for tests fix cuda version for autogptq index set torch in requirements so that it installs properly move gptq install around to work with github cicd * gptq doesn't play well with sample packing * address pr feedback * remove torch install for now * set quantization_config from model config * Fix the implementation for getting quant config from model config	2023-09-05 12:43:22 -04:00
Wing Lian	96deb6bd67	recast loralayer, norm, lmhead + embed token weights per original qlora (#393 ) * recast loralayer, norm, lmhead + embed token weights per original qlora * try again for the fix * refactor torch dtype picking * linter fixes * missing import for LoraLayer * fix install for tests now that peft is involved	2023-08-21 18:41:12 -04:00
mhenrichsen	cf6654769a	flash attn pip install (#426 ) * flash attn pip * add packaging * add packaging to apt get * install flash attn in dockerfile * remove unused whls * add wheel * clean up pr fix packaging requirement for ci upgrade pip for ci skip build isolation for requiremnents to get flash-attn working install flash-attn seperately * install wheel for ci * no flash-attn for basic cicd * install flash-attn as pip extras --------- Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net> Co-authored-by: mhenrichsen <some_email@hey.com> Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-18 19:00:27 -04:00
Wing Lian	d3d6fd6ae6	just resort to tags ans use main-latest (#424 )	2023-08-16 00:39:57 -04:00
Wing Lian	5f80b3560b	use inputs for image rather than outputs for docker metadata (#420 )	2023-08-15 18:26:59 -04:00
Wing Lian	7af816699e	tag with latest as well for axolotl-runpod (#418 ) * tag with latest as well for axolotl-runpod * no dev branch for now	2023-08-15 15:30:41 -04:00
Wing Lian	918f1b0dfb	revert previous change and build ax images w docker on gpu (#371 )	2023-08-12 20:23:00 -04:00
Wing Lian	c3fde36ada	attempt to run non-base docker builds on regular cpu hosts (#369 )	2023-08-12 19:07:38 -04:00
Wing Lian	2c37bf6c21	Prune cuda117 (#327 ) * drop cuda117/torch 1.13.1 from support, pin flash attention to v2.0.1, rm torchvision/torchaudio install * gptq base build not needed. add sm 9.0 support	2023-07-26 16:27:49 -04:00
Wing Lian	ff7f18d1ed	disable gh cache for first step of docker builds too	2023-07-22 11:46:37 -04:00
Wing Lian	cf62cfd661	add runpod envs to .bashrc, fix bnb env (#316 ) * hopper support for base dockerfile, add runpod envs to .bashrc * set BNB_CUDA_VERSION env for latest bnb * don't support hopper yet w 118	2023-07-22 10:09:38 -04:00
Wing Lian	c5df969262	don't use the gha cache w docker	2023-07-22 08:46:21 -04:00
Wing Lian	c58034d48c	use pytorch 2.0.1	2023-07-20 00:47:13 -04:00
Wing Lian	a10da1caff	11.7.0 nvidia/cuda docker images are deprecated, move to 11.7.1 Some checks failed ci-cd-base / build-base (<nil>, 117, 11.7.1, 3.9, 1.13.1) (push) Has been cancelled Details ci-cd-base / build-base (<nil>, 118, 11.8.0, 3.10, 2.0.0) (push) Has been cancelled Details ci-cd-base / build-base (<nil>, 118, 11.8.0, 3.9, 2.0.0) (push) Has been cancelled Details ci-cd-base / build-base (gptq, 118, 11.8.0, 3.9, 2.0.0) (push) Has been cancelled Details pre-commit / pre-commit (push) Has been cancelled Details PyTest / test (3.10) (push) Has been cancelled Details PyTest / test (3.9) (push) Has been cancelled Details	2023-07-01 00:29:07 -04:00
Wing Lian	d35278aaf1	don't fail fast	2023-06-15 16:01:27 -04:00
Wing Lian	e0011fdf55	Fix base builder, missing tags	2023-05-31 09:52:03 -04:00
Wing Lian	e3d03745ba	add py310 support from base image	2023-05-31 09:07:28 -04:00
Wing Lian	c5b0af1a7e	define python version (3.10) explicitly as string in yaml	2023-05-30 22:23:35 -04:00
Wing Lian	c43c5c84ff	py310, fix cuda arg in deepspeed	2023-05-30 18:02:34 -04:00
Wing Lian	bbc5bc5791	Merge pull request #108 from OpenAccess-AI-Collective/docker-gptq default to qlora support, make gptq specific image	2023-05-30 15:07:04 -04:00
NanoCode012	36596adaf7	Add pre-commit: black+flake8+pylint	2023-05-31 02:53:22 +09:00
Wing Lian	48612f8376	cleanup from pr feedback	2023-05-30 09:56:30 -04:00
Wing Lian	6ef96f569b	default to qlora support, make gptq specific image	2023-05-29 20:34:41 -04:00

1 2

77 Commits