axolotl

Author	SHA1	Message	Date
Hamel Husain	63fb3eb426	set default for merge (#1044 )	2024-01-04 18:14:20 -08:00
Hamel Husain	31d23504a5	fix model card upload for PEFT models (#1043 )	2024-01-04 18:13:54 -08:00
Wing Lian	f243c2186d	RL/DPO (#935 ) * ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs	2024-01-04 18:22:55 -05:00
xaviviro	59b2d302c8	Added chatglm3 conversation type for training models like TinyLLama (#1036 ) * Added chatgml3 conversation type for training models like TinyLLama * Added chatgml3 conversation type for training models like TinyLLama with lint * Added chatgml3 conversation type for training models like TinyLLama with lint	2024-01-04 21:03:04 +09:00
Wing Lian	bcc78d8fa3	bump transformers and update attention class map name (#1023 ) * bump transformers and update attention class map name * also run the tests in docker * add mixtral e2e smoke test * fix base name for docker image in test * mixtral lora doesn't seem to work, at least check qlora * add testcase for mixtral w sample packing * check monkeypatch for flash attn multipack * also run the e2e tests in docker * use all gpus to run tests in docker ci * use privileged mode too for docker w gpus * rename the docker e2e actions for gh ci * set privileged mode for docker and update mixtral model self attn check * use fp16/bf16 for mixtral w fa2 * skip e2e tests on docker w gpus for now * tests to validate mistral and mixtral patches * fix rel import	2024-01-03 12:11:04 -08:00
NanoCode012	74532ddc45	chore(config): clean up old log for Qwen (#1034 )	2024-01-04 01:19:52 +09:00
NanoCode012	8ba27f3bde	fix: lint (#1037 )	2024-01-03 10:23:44 -05:00
Hamel Husain	a3e8783328	[Docs] delete unused cfg value `lora_out_dir` (#1029 ) * Update README.md * Update README.md * Update README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-02 21:35:20 -08:00
NanoCode012	b31038aae9	chore(readme): update instruction to set config to load from cache (#1030 )	2024-01-03 11:56:19 +09:00
Tim Dolan	c75f916745	added tiny llama examples for lora and qlora (#1027 ) * added tiny llama examples for lora and qlora * corrected yml files and removed tiny-llama.yml from llama-2 example	2024-01-02 20:00:37 -05:00
Wing Lian	4d2e842e46	use recommended setting for use_reentrant w gradient checkpointing (#1021 ) * use recommended setting for use_reentrant w gradient checkpointing * add doc for gradient_checkpointing_kwargs	2024-01-01 22:17:27 -05:00
Tazik Shahjahan	3678a6c41d	Fix: bf16 support for inference (#981 ) * Fix: bf16 torch dtype * simplify casting to device and dtype --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-29 16:15:53 -06:00
mhenrichsen	f8ae59b0a8	Adds chat templates (#1022 )	2023-12-29 15:44:23 -06:00
Hamel Husain	4f4d638b84	[WandB] Push axolotl config to top level wandb files (#1014 )	2023-12-29 10:52:12 -08:00
Wing Lian	ba043a361e	add ultrachat prompt strategies (#996 )	2023-12-29 12:23:29 -06:00
NanoCode012	41353d2ea0	feat: expose bnb kwargs (#1018 ) * feat: expose bnb kwargs * chore: added examples and link per suggestion * Uncomment defaults per suggestion for readability Co-authored-by: Hamel Husain <hamel.husain@gmail.com> --------- Co-authored-by: Hamel Husain <hamel.husain@gmail.com>	2023-12-29 18:16:26 +09:00
NanoCode012	f6ecf14dd4	feat: remove need to add load_in* during merge (#1017 )	2023-12-29 18:15:30 +09:00
Hamel Husain	dec66d7c53	[Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013 )	2023-12-28 18:00:16 -08:00
Hamel Husain	76357dc5da	Update README.md (#1012 )	2023-12-28 18:00:02 -08:00
Wing Lian	70b46ca4f4	remove landmark attn and xpos rope implementations (#1010 )	2023-12-27 21:07:27 -08:00
Hamel Husain	85dd4d525b	add config to model card (#1005 ) * add config to model card * rm space * apply black formatting * apply black formatting * fix formatting * check for cfg attribute * add version * add version * put the config in a collapsible element * put the config in a collapsible element	2023-12-27 21:25:33 -06:00
Kevin Sydney	384b817dc0	Set eval_sample_packing to false in mistral config.yaml (#1003 ) Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.	2023-12-27 16:11:55 -08:00
Younes Belkada	db9094df0f	FEAT: add tagging support to axolotl (#1004 ) * add tagging support to axolotl * chore: lint * fix method w self --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-27 16:25:20 -06:00
Evan Griffiths	6ef46f8dca	Add an example config for finetuning a 34B model on a 24GB GPU (#1000 ) * Add an example config for finetuning a 34B model on a 24GB GPU * Remore wandb project	2023-12-25 10:29:55 -08:00
Wing Lian	628b754824	set output_router_logits for mixtral config: (#995 )	2023-12-22 12:57:02 -05:00
Wing Lian	37820f6540	support for cuda 12.1 (#989 )	2023-12-22 11:08:22 -05:00
NanoCode012	7d4185ffcb	chore: Update transformers to latest (#986 )	2023-12-23 00:29:36 +09:00
mhenrichsen	93ebec1ac5	change val size (#992 )	2023-12-22 16:18:16 +01:00
Hamel Husain	2e61dc3180	Add tests to Docker (#993 )	2023-12-22 06:37:20 -08:00
NanoCode012	1ffa3866f2	Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787 ) * Feat: Auto add to modules_to_save when adding tokens * fix: swap to error instead of warning * feat: add check when special_tokens differ and add test	2023-12-22 21:49:07 +09:00
Hamel Husain	62ba1609b6	bump actions versions	2023-12-21 08:54:08 -08:00
Hamel Husain	7bbaac98f7	fix mistral prompt assembly (#982 ) * fix mistral prompts * fix spacing * remove elif	2023-12-21 08:00:55 -08:00
Wing Lian	161bcb6517	Dockerfile torch fix (#987 ) * add torch to requirements.txt at build time to force version to stick * fix xformers check * better handling of xformers based on installed torch version * fix for ci w/o torch	2023-12-21 09:38:20 -05:00
Ikko Eltociear Ashimine	d25c34caa6	Update README.md (#966 )	2023-12-17 09:51:25 -05:00
NanoCode012	13e938149d	fix: add lr scheduler kwargs to Trainer (#972 )	2023-12-17 18:48:28 +09:00
Wing Lian	85de004dd4	fix for build for nccl in dockerfile (#970 )	2023-12-16 19:12:01 -05:00
Wing Lian	80ec7af358	update to latest nccl in docker image (#965 )	2023-12-16 18:31:25 -05:00
dumpmemory	f28e75513b	update transformers to fix checkpoint saving (#963 )	2023-12-15 21:03:17 -05:00
Hamel Husain	5ada140ff0	Fix prompt assembly for llama (#952 ) * start at index 0 * add test to check for missing turns * apply black * Update test_prompt_tokenizers.py * Update src/axolotl/monkeypatch/fastchat_conversation_turns.py Co-authored-by: Motoki Wu <tokestermw@gmail.com> * fix linting * apply black * add more tests for llama/sharegpt * make logic clearer --------- Co-authored-by: Motoki Wu <tokestermw@gmail.com>	2023-12-14 10:03:59 -08:00
Hamel Husain	712fd27b3f	Add docs (#947 ) * move section * update README * update README * update README * update README * update README * Update README.md Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-13 14:22:52 -08:00
kallewoof	ef24342538	fix: switch to using the HuggingFace Transformers NEFT implementation (#941 ) * fix: switch to using the HuggingFace Transformers NEFT implementation * linter * add support for noisy_embedding_alpha with a warning about it being renamed * restore pre/posttrain_hooks * move validation of NEFT noise alpha into validate_config() * linter	2023-12-13 17:15:34 -05:00
Wing Lian	5ea3aa31f0	Fix Deepspeed loading (#950 ) * add check for zero3 * freeze parameters * fixes for deepspeed loading * fix model parameter check * unfrozen parameters in example mixtral and logging when unfreezing	2023-12-13 16:03:23 -05:00
Wing Lian	f1f60cb5b2	Flash attn hotfix (#951 ) * use previous arg * use eager to use legacy attention that can be patched	2023-12-13 13:42:23 -05:00
kallewoof	450e04d3c4	fix: remove excessive newlines in system prompt(s) for alpaca (#936 )	2023-12-13 16:40:02 +09:00
Juraj Bednar	b0cf397ecb	More hints on what to do with CUDA Out of memory errors (#925 )	2023-12-13 16:38:38 +09:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
Hamel Husain	f1de29dd1e	Respect sequence_len in config for `type: llama2_chat` (#926 ) * Respect sequence_len in config for `type: llama2_chat` It was hardcoded to `4096` I am not sure why? This updates it to pull from the config. cc: @winglian * Update llama2_chat.py * apply black formatting * fix tokenizer * update test data * lint fixtures	2023-12-12 09:39:22 -08:00
Wing Lian	7fabc4d95e	Mixtral official (#942 ) * multipack support for official mixtral implementation * fix patch to load multipack for mixtral * chore: lint	2023-12-11 23:44:33 -05:00
Motoki Wu	9a5eb3990c	Update requirements.txt (#940 )	2023-12-11 22:57:28 -05:00
Casper	86487c2e96	Mixtral: More correct MoE, lower loss (#932 ) * More correct MoE * Fix formatting	2023-12-10 10:34:25 -05:00

1 2 3 4 5 ...

1145 Commits