axolotl

Author	SHA1	Message	Date
Wing Lian	e2882dd749	drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings (#2121 ) [skip ci] * drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings * make sure to run tests when cicd Dockerfile changes	2024-12-04 12:25:47 -05:00
NanoCode012	bd8436bc6e	feat: add cut_cross_entropy (#2091 ) * feat: add cut_cross_entropy * fix: add to input * fix: remove from setup.py * feat: refactor into an integration * chore: ignore lint * feat: add test for cce * fix: set max_steps for liger test * chore: Update base model following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: update special_tokens following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: remove with_temp_dir following comments * fix: plugins aren't loaded * chore: update quotes in error message * chore: lint * chore: lint * feat: enable FA on test * chore: refactor get_pytorch_version * fix: lock cce commit version * fix: remove subclassing UT * fix: downcast even if not using FA and config check * feat: add test to check different attentions * feat: add install to CI * chore: refactor to use parametrize for attention * fix: pytest not detecting test * feat: handle torch lower than 2.4 * fix args/kwargs to match docs * use release version cut-cross-entropy==24.11.4 * fix quotes * fix: use named params for clarity for modal builder * fix: handle install from pip * fix: test check only top level module install * fix: re-add import check * uninstall existing version if no transformers submodule in cce * more dataset fixtures into the cache --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2024-12-03 08:22:22 -05:00
Wing Lian	a4f4a56d77	build causal_conv1d and mamba-ssm into the base image (#2113 ) * build causal_conv1d and mamba-ssm into the base image * also build base images on changes to Dockerfile-base and base workflow yaml	2024-12-02 18:27:46 -05:00
Wing Lian	5f1d98e8fc	add e2e tests for Unsloth qlora and test the builds (#2093 ) * see if unsloth installs cleanly in ci * check unsloth install on regular tests, not sdist * fix ampere check exception for ci * use cached_property instead * add an e2e test for unsloth qlora * reduce seq len and mbsz to prevent oom in ci * add checks for fp16 and sdp_attention * pin unsloth to a specific release * add unsloth to docker image too * fix flash attn xentropy patch * fix loss, add check for loss when using fa_xentropy * fix special tokens for test * typo * test fa xentropy with and without gradient accum * pr feedback changes	2024-11-29 20:38:49 -05:00
Wing Lian	2d7830fda6	upgrade to flash-attn 2.7.0 (#2048 )	2024-11-14 06:59:25 -05:00
Wing Lian	234e94e9dd	replace references to personal docker hub to org docker hub (#2036 ) [skip ci]	2024-11-11 15:09:29 -05:00
Wing Lian	035e9f9dd7	janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build (#2022 )	2024-11-07 17:54:29 -05:00
Wing Lian	e12a2130e9	first pass at pytorch 2.5.0 support (#1982 ) * first pass at pytorch 2.5.0 support * attempt to install causal_conv1d with mamba * gracefully handle missing xformers * fix import * fix incorrect version, add 2.5.0 * increase tests timeout	2024-10-21 11:00:45 -04:00
Wing Lian	3ebf22464b	qlora-fsdp ram efficient loading with hf trainer (#1791 ) * fix 405b with lower cpu ram requirements * make sure to use doouble quant and only skip output embeddings * set model attributes * more fixes for sharded fsdp loading * update the base model in example to use pre-quantized nf4-bf16 weights * upstream fixes for qlora+fsdp	2024-07-30 19:21:38 -04:00
Wing Lian	9a63884597	update test and main/nightly builds (#1797 ) * update test and main/nightly builds * don't install mamba-ssm on 2.4.0 since it has no wheels yet	2024-07-30 12:37:40 -04:00
Wing Lian	d4f6a6b103	fix dockerfile and base builder (#1795 ) [skip-ci]	2024-07-30 08:34:37 -04:00
Wing Lian	78e12f8ca5	add basic support for the optimi adamw optimizer (#1727 ) * add support for optimi_adamw optimizer w kahan summation * pydantic validator for optimi_adamw * workaround for setting optimizer for fsdp * make sure to install optimizer packages * make sure to have parity for model parameters passed to optimizer * add smoke test for optimi_adamw optimizer * don't use foreach optimi by default	2024-07-14 19:12:57 -04:00
mhenrichsen	1194c2e0b1	github urls (#1734 ) Co-authored-by: Henrichsen, Mads (ext) <mads.henrichsen.ext@siemens-energy.com>	2024-07-11 09:19:29 -04:00
Wing Lian	891ae8aa13	fix ray install (#1630 )	2024-05-16 01:25:42 -04:00
Wing Lian	0c49ecc429	more fixes to work with runpod + skypilot (#1629 )	2024-05-16 00:05:56 -04:00
Wing Lian	60113437e4	cloud image w/o tmux (#1628 )	2024-05-15 22:27:40 -04:00
Wing Lian	419b2a6a98	install rsync too (#1627 )	2024-05-15 21:36:00 -04:00
Wing Lian	e6937e884b	fix symlinks for axolotl outputs (#1625 )	2024-05-15 19:41:45 -04:00
Wing Lian	039e2a0370	bump versions of deps (#1621 ) * bump versions of deps * bump transformers too * fix xformers deps and include s3fs install	2024-05-15 13:27:44 -04:00
Wing Lian	4fde300e5f	update outputs path so that we can mount workspace to /workspace/data (#1623 ) * update outputs path so that we can mount workspace to /workspace/data * fix ln order	2024-05-15 12:44:13 -04:00
Wing Lian	89134f2143	make sure to install causal_conv1d in docker (#1459 )	2024-03-29 16:43:25 -04:00
Wing Lian	dd449c5cd8	support galore once upstreamed into transformers (#1409 ) * support galore once upstreamed into transformers * update module name for llama in readme and fix typing for all linear * bump trl for deprecation fixes from newer transformers * include galore as an extra and install in docker image * fix optim_args type * fix optim_args * update dependencies for galore * add galore to cicd dockerfile	2024-03-19 09:26:35 -04:00
Wing Lian	6d4bbb877f	deprecate py 3.9 support, set min pytorch version (#1343 ) [skip ci]	2024-02-28 12:58:05 -05:00
Wing Lian	5894f0e57e	make mlflow optional (#1317 ) * make mlflow optional * fix xformers don't patch swiglu if xformers not working fix the check for xformers swiglu * fix install of xformers with extra index url for docker builds * fix docker build arg quoting	2024-02-26 11:41:33 -05:00
Wing Lian	d113331e9a	add a helpful motd for cloud image (#1235 ) [skip ci]	2024-01-31 10:26:02 -05:00
Wing Lian	8da1633124	Revert "run PR e2e docker CI tests in Modal" (#1220 ) [skip ci]	2024-01-26 16:50:44 -05:00
Wing Lian	36d053f6f0	run PR e2e docker CI tests in Modal (#1217 ) [skip ci] * wip modal for ci * handle falcon layernorms better * update * rebuild the template each time with the pseudo-ARGS * fix ref * update tests to use modal * cleanup ci script * make sure to install jinja2 also * kickoff the gh action on gh hosted runners and specify num gpus	2024-01-26 16:13:27 -05:00
Wing Lian	8a49309489	upgrade deepspeed to 0.13.1 for mixtral fixes (#1189 ) [skip ci] * upgrade deepspeed to 0.13.1 for mixtral fixes * move deepspeed-kernels install to setup.py	2024-01-24 14:26:40 -05:00
Wing Lian	eaaeefce55	jupyter lab fixes (#1139 ) [skip ci] * add a basic notebook for lab users in the root * update notebook and fix cors for jupyter * cell is code * fix eval batch size check * remove intro notebook	2024-01-22 18:42:40 -05:00
Wing Lian	729740df81	Dockerfile cloud ports (#1148 ) * explicitly expose ports 8888 and 22 * support for SSH_KEY from latitude	2024-01-18 22:04:25 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00
Wing Lian	23495a80af	misc fixes from #943 (#1086 ) [skip ci]	2024-01-10 22:31:36 -05:00
NanoCode012	d69ba2b0b7	fix: warn user to install mamba_ssm package (#1019 )	2024-01-10 02:50:56 -05:00
Wing Lian	788649fe95	attempt to also run e2e tests that needs gpus (#1070 ) * attempt to also run e2e tests that needs gpus * fix stray quote * checkout specific github ref * dockerfile for tests with proper checkout ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support * pytest skip for auto-gptq requirements * skip mamba tests for now, split multipack and non packed lora llama tests * split tests that use monkeypatches * fix relative import for prev commit * move other tests using monkeypatches to the correct run	2024-01-09 21:23:23 -05:00
Hamel Husain	2e61dc3180	Add tests to Docker (#993 )	2023-12-22 06:37:20 -08:00
Wing Lian	161bcb6517	Dockerfile torch fix (#987 ) * add torch to requirements.txt at build time to force version to stick * fix xformers check * better handling of xformers based on installed torch version * fix for ci w/o torch	2023-12-21 09:38:20 -05:00
Wing Lian	85de004dd4	fix for build for nccl in dockerfile (#970 )	2023-12-16 19:12:01 -05:00
Wing Lian	80ec7af358	update to latest nccl in docker image (#965 )	2023-12-16 18:31:25 -05:00
Wing Lian	68b227a7d8	Mixtral multipack (#928 ) * mixtral multipack * use mixtral model * sample yml * calculate cu_seqlens properly * use updated flash ettention setting * attn var checks * force use of flash attention 2 for packing * lint * disable future fix for now * update support table	2023-12-09 21:26:30 -05:00
Wing Lian	f544ab2bed	don't compile deepspeed or bitsandbytes from source (#837 )	2023-11-08 19:49:55 -05:00
Fabian Preiß	8056ecd30e	add deepspeed-kernels dependency for deepspeed>=0.12.0 (#827 )	2023-11-05 07:52:56 -05:00
Wing Lian	2aa1f71464	fix pytorch 2.1.0 build, add multipack docs (#722 )	2023-10-13 08:57:28 -04:00
Wing Lian	aca0398315	apex not needed as amp is part of pytorch (#696 )	2023-10-07 12:20:45 -04:00
Wing Lian	de87ea68f6	fix multiline for docker (#694 )	2023-10-06 22:38:15 -04:00
NanoCode012	133e676bcc	Feat: Set WORKDIR to /workspace/axolotl (#679 )	2023-10-06 04:09:14 +09:00
Maxime	923eb91304	tweak: improve base builder for smaller layers (#500 )	2023-09-22 16:17:50 -04:00
Wing Lian	e85d2eb06b	let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners (#427 )	2023-09-21 20:36:30 -04:00
Wing Lian	b53e77775b	update dockerfile to not build evoformer since it fails the build (#607 )	2023-09-19 16:28:29 -04:00
Wing Lian	34c0a86a11	update readme to point to direct link to runpod template, cleanup install instrucitons (#532 ) * update readme to point to direct link to runpod template, cleanup install instrucitons * default install flash-attn and auto-gptq now too * update readme w flash-attn extra * fix version in setup	2023-09-08 11:58:54 -04:00
Wing Lian	3355706e22	Add support for GPTQ using native transformers/peft (#468 ) * auto gptq support * more tweaks and add yml * remove old gptq docker * don't need explicit peft install for tests * fix setup.py to use extra index url install torch for tests fix cuda version for autogptq index set torch in requirements so that it installs properly move gptq install around to work with github cicd * gptq doesn't play well with sample packing * address pr feedback * remove torch install for now * set quantization_config from model config * Fix the implementation for getting quant config from model config	2023-09-05 12:43:22 -04:00

1 2 3

103 Commits