axolotl

Author	SHA1	Message	Date
Wing Lian	8779997ba5	native support for modal cloud from CLI (#2237 ) * native support for modal cloud from CLI * do lm_eval in cloud too * Fix the sub call to lm-eval * lm_eval option to not post eval, and append not extend * cache bust when using branch, grab sha of latest image tag, update lm-eval dep * allow minimal yaml for lm eval * include modal in requirements * update link in README to include utm * pr feedback * use chat template * revision support * apply chat template as arg * add wandb name support, allow explicit a100-40gb * cloud is optional * handle accidental setting of tasks with a single task str * document the modal cloud yaml for clarity [skip ci] * cli docs * support spawn vs remote for lm-eval * Add support for additional docker commands in modal image build * cloud config shouldn't be a dir * Update README.md Co-authored-by: Charles Frye <cfrye59@gmail.com> * fix annotation args --------- Co-authored-by: Charles Frye <cfrye59@gmail.com>	2025-01-30 11:34:02 -05:00
Wing Lian	8fb72cbc0b	use the extracted field_messages to parse the role fields (#2265 )	2025-01-21 15:39:30 -05:00
Dan Saunders	1ed4de73b6	CLI cleanup and documentation (#2244 ) * CLI init refactor * fix * cleanup and (partial) docs * Adding documentation and continuing cleanup (in progress) * remove finetune.py script * continued cleanup and documentation * pytest fixes * review comments * fix * Fix * typing fixes * make sure the batch dataset patcher for multipack is always loaded when handling datasets * review comments * fix --------- Co-authored-by: Dan Saunders <dan@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-01-13 17:55:29 +00:00
Wing Lian	1f623e6cc8	transformers 4.47.1 (#2187 ) * transformers 4.47.1 * drop monkeypatches * can't remove patches yet * make flash attention forward ignore the loss kwargs * patch the flash attention in the modeling arch too * remove fsdp and deepspeed patches * cleanup PR * bump accelerate and torchao, also logically reorder/group requirements * meant to include torchao * use official patch release	2024-12-17 11:01:21 -05:00
Wing Lian	d009ead101	fix build w pyproject to respect insalled torch version (#2168 ) * fix build w pyproject to respect insalled torch version * include in manifest * disable duplicate code check for now * move parser so it can be found * add checks for correct pytorch version so this doesn't slip by again	2024-12-10 16:25:25 -05:00
Wing Lian	d87df2c776	prepare plugins needs to happen so registration can occur to build the plugin args (#2119 ) * prepare plugins needs to happen so registration can occur to build the plugin args use yaml.dump include dataset and more assertions * attempt to manually register plugins rather than use fn * fix fixture * remove fixture * move cli test to patched dir * fix cce validation	2024-12-03 15:06:09 -05:00
NanoCode012	bd8436bc6e	feat: add cut_cross_entropy (#2091 ) * feat: add cut_cross_entropy * fix: add to input * fix: remove from setup.py * feat: refactor into an integration * chore: ignore lint * feat: add test for cce * fix: set max_steps for liger test * chore: Update base model following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: update special_tokens following suggestion Co-authored-by: Wing Lian <wing.lian@gmail.com> * chore: remove with_temp_dir following comments * fix: plugins aren't loaded * chore: update quotes in error message * chore: lint * chore: lint * feat: enable FA on test * chore: refactor get_pytorch_version * fix: lock cce commit version * fix: remove subclassing UT * fix: downcast even if not using FA and config check * feat: add test to check different attentions * feat: add install to CI * chore: refactor to use parametrize for attention * fix: pytest not detecting test * feat: handle torch lower than 2.4 * fix args/kwargs to match docs * use release version cut-cross-entropy==24.11.4 * fix quotes * fix: use named params for clarity for modal builder * fix: handle install from pip * fix: test check only top level module install * fix: re-add import check * uninstall existing version if no transformers submodule in cce * more dataset fixtures into the cache --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2024-12-03 08:22:22 -05:00
Wing Lian	5f1d98e8fc	add e2e tests for Unsloth qlora and test the builds (#2093 ) * see if unsloth installs cleanly in ci * check unsloth install on regular tests, not sdist * fix ampere check exception for ci * use cached_property instead * add an e2e test for unsloth qlora * reduce seq len and mbsz to prevent oom in ci * add checks for fp16 and sdp_attention * pin unsloth to a specific release * add unsloth to docker image too * fix flash attn xentropy patch * fix loss, add check for loss when using fa_xentropy * fix special tokens for test * typo * test fa xentropy with and without gradient accum * pr feedback changes	2024-11-29 20:38:49 -05:00
Wing Lian	2e99bb303e	fix inference when no chat_template is set, fix unsloth dora check (#2092 ) * fix inference when no chat_template is set, fix unsloth dora check * remove old unsloth version check * update docs on installing unsloth	2024-11-20 14:07:54 -05:00
Wing Lian	f3a5d119af	fix env var extraction (#2043 ) [skip ci]	2024-11-14 12:58:06 -05:00
Wing Lian	76883851d2	add warning that sharegpt will be deprecated (#1957 ) * add warning that sharegpt will be deprecated * add helper script for chat_templates and document deprecation * Update src/axolotl/prompt_strategies/sharegpt.py Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-10-11 13:33:20 -04:00
mhenrichsen	1194c2e0b1	github urls (#1734 ) Co-authored-by: Henrichsen, Mads (ext) <mads.henrichsen.ext@siemens-energy.com>	2024-07-11 09:19:29 -04:00
Wing Lian	0c49ecc429	more fixes to work with runpod + skypilot (#1629 )	2024-05-16 00:05:56 -04:00
Wing Lian	2501a371c6	fix setting the authorized keys when there are more than one in the env var (#1626 )	2024-05-15 20:48:56 -04:00
Wing Lian	e6937e884b	fix symlinks for axolotl outputs (#1625 )	2024-05-15 19:41:45 -04:00
Wing Lian	120b809465	fix for jupyterlab on cloud start (#1594 )	2024-05-05 10:08:43 -04:00
Wing Lian	d113331e9a	add a helpful motd for cloud image (#1235 ) [skip ci]	2024-01-31 10:26:02 -05:00
Wing Lian	eaaeefce55	jupyter lab fixes (#1139 ) [skip ci] * add a basic notebook for lab users in the root * update notebook and fix cors for jupyter * cell is code * fix eval batch size check * remove intro notebook	2024-01-22 18:42:40 -05:00
Wing Lian	cbecf3e62a	fix check for env var (#1151 )	2024-01-18 23:58:11 -05:00
Wing Lian	729740df81	Dockerfile cloud ports (#1148 ) * explicitly expose ports 8888 and 22 * support for SSH_KEY from latitude	2024-01-18 22:04:25 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
Napuh	85b0be2ba7	Warn users to login to HuggingFace (#645 ) * added warning if user is not logged in HF * updated doc to suggest logging in to HF	2023-09-27 17:43:35 -04:00
Wing Lian	861cecac2a	refactor scripts/finetune.py into new cli modules (#550 ) * refactor scripts/finetune.py into new cli modules * continue to support scripts/finetune.py * update readme with updated cli commands * Update scripts/finetune.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-09-15 01:43:52 -04:00
Wing Lian	fdb777bc06	check for the existence of the default accelerate config that can create headaches (#561 )	2023-09-13 10:38:28 -04:00
Wing Lian	bf0804447c	fix wandb so mypy doesn't complain (#562 ) * fix wandb so mypy doesn't complain * fix wandb so mypy doesn't complain * no need for mypy override anymore	2023-09-13 10:36:16 -04:00
Aman Gupta Karmani	44454ae4c4	move is_llama_derived_model into normalize_config (#524 )	2023-09-04 00:19:03 -04:00
Tom Jobbins	48434bec54	Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see (#511 )	2023-08-31 14:26:52 -07:00
Wing Lian	b21e4a20fe	split train from other cli options (#503 )	2023-08-30 22:01:47 -07:00
Wing Lian	548787daae	customizable ascii art (#506 )	2023-08-29 10:13:42 -07:00
Maxime	36b2e1cfee	tweak: use default config file when only one file is present (#501 )	2023-08-29 06:17:10 -07:00
Wing Lian	125cccb786	Refactor train cfg cli (#499 ) * wip to cleanup cfg cli options * fix launcher * fix cli args	2023-08-29 05:37:53 -07:00
Maxime	17605b85d8	fix: inference did not move the model to the correct device (#483 )	2023-08-26 16:40:56 -04:00
Charles O. Goddard	bde3c5a478	ReLoRA implementation (with quantization) (#322 ) * Experimental ReLoRA (+qlora) implementation * Add CPU offload * Remove local config * Fix saving logic * Remove redundant assert * Fix logic errors * Move ReLoRA into its own trainer class with a method override to create the proper scheduler * Formatting & typing fixes * Use safe_serialization * Don't allow fsdp/deepspeed with ReLoRA * Fix cpu-offload logic, enable multi gpu * Document parameters and add comment * Fix merge issue * Smooth over some sharp edges * Implement resume from checkpoint for relora * Address review comments * Fix saving logic * Add necessary metadata to safetensors --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-23 23:07:18 -04:00
NanoCode012	04a42b6db1	feat(docs): improve user customized prompts (#443 ) * feat(docs): improve user customized prompts * feat(doc): add custom pretokenized instructions * chore: clean old data folder * chore: add new line	2023-08-20 23:59:43 -04:00
Wing Lian	29241cf1e4	Ax art (#405 ) * axolotl text art :D * only print art on rank0 * lint and pr feedback	2023-08-15 08:34:30 -04:00
Aman Karmani	2e22404d2d	add utils.data.prepare_dataset	2023-08-14 21:28:29 -07:00
Wing Lian	fc2d6be96d	use context manager to run things on rank0 before others (#397 )	2023-08-15 00:10:47 -04:00
Gabriel Puliatti	3c2ad00d07	Feat(config): add max steps (#387 )	2023-08-14 11:19:29 -04:00
Wing Lian	86a91e260b	save tokenizer before training starts (#380 )	2023-08-13 11:28:58 -04:00
Aman Karmani	efb3b2c95e	simplify `load_tokenizer`	2023-08-12 18:55:06 -07:00
Aman Karmani	7b55fe6419	improve GPU logging to break out pytorch cache and system mem	2023-08-12 18:52:57 -07:00
Aman Karmani	8cec513447	extract module for working with cfg	2023-08-12 18:25:27 -07:00
Wing Lian	2bb0b78975	Attention mask and position id fixes for packing (#285 ) * fix attetion mask with packing * set position ids and use block diagonal attn mask * fix expand mask for multiple batch items, make sure we pad position_ids * don't move masks to cpu * use multi pack dataloader w random sampler * add position_ids back * more fixes for dataloader integration * est total tokens, fix field loop * more fixes, position_ids seems broken * more fixes for sample packing * use distributed sampler, avoid accelerate prepare * use accelerator prepare for dataloader * fix for position_ids w packing * Update src/axolotl/utils/dataloader.py * validation for sample packing and doc * more fixes for 4k and optimizations * optimized expand mask fn * better handling of variance in multipack dataloader length and trainer hanging when it runs out of data * fix rounding of len of batches to int * better handling so that all devices have the same dataloader len * fix step calc for packing * pass sample packing efficiency to training args * add a test for the mask expansion for sequence packing * only process eval dataset for packing if not None * don't split batches when packing * weighted CE losses * weighted CEL fixes * limit packing to sequences of max seq len * seq_len_multiple for packing * make sure the chunk size is an int * sample_packing_seq_len_multiplier config * use cumulative seq len with var len flash attn v2 w packing * properly calculate max len * fix flash-attn, xformers, packing, support chatml * fix chatml system prompt for openorca, legacy tokenizer opts * add chatml * add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test * fix test and pylint checks * more packing and dataset optimizations and fixes * filter w multiple cpus * more fixes and optimizations * fixes and go back to distributed sampler since batch sampler won't work * fix counts by accounting for num devices * fix steps calculation * previous accelerate is still most performant * add numba to requirements. * use custom distributed checks * fix sampler to prevent overfit w new epochs * let's not cleanup the cached datasets * calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier * speed optimizations and set accelerate fsdp env vars * optimize dataset concatenation? * more optimizations for dataset handling * fix import for annotation * manual pre-commit fixes * another sum optimization and bug fix for calc steps * fix packing estimations * fix formatting * pylint problems * add back flash attention branch for handling unpacked sequences seperately * Address PR feedback * add optional sample packing config params to readme	2023-08-12 15:14:56 -04:00
NanoCode012	a276c9c88d	Fix(save): Save as safetensors (#363 )	2023-08-13 01:22:52 +09:00
NanoCode012	289d5c403d	feat(merge): save tokenizer on merge (#362 )	2023-08-13 00:18:10 +09:00
Aman Gupta Karmani	11ddccb80f	Merge pull request #356 from tmm1/load_model-args simplify `load_model` signature	2023-08-09 18:24:34 -07:00
Aman Karmani	718102271f	simplify load_model signature	2023-08-09 22:36:02 +00:00
Aman Karmani	e303d64728	log GPU memory usage	2023-08-09 18:26:28 +00:00
Wing Lian	894cba09f3	fix FSDP save of final model (#329 )	2023-07-30 21:46:44 -04:00

1 2 3 4

156 Commits