axolotl

Author	SHA1	Message	Date
NanoCode012	043c3860cd	fix: `train_on_inputs: true` ignored for sharegpt (#1045 ) [skip ci] * fix: `train_on_inputs: true` ignored for sharegpt * enable unit test for train_on_inputs for sharegpt --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 23:00:09 -05:00
Wing Lian	0f100800e3	be more robust about checking embedding modules for lora finetunes (#1074 ) [skip ci] * be more robust about checking embedding modules for lora finetunes * update dynamic error message	2024-01-09 22:58:54 -05:00
Wing Lian	ead34c516a	swap the data collator for evals if not using sample packing (#1076 ) * swap the data collator for evals if not using sample packing * drop last from dataloader to help with issues with evals	2024-01-09 22:16:24 -05:00
Wing Lian	ec02b7cc4e	Update FUNDING.yml [skip ci]	2024-01-09 22:15:27 -05:00
Wing Lian	3b4c646f87	Update FUNDING.yml with bitcoin (#1079 ) [skip ci]	2024-01-09 21:56:52 -05:00
Wing Lian	788649fe95	attempt to also run e2e tests that needs gpus (#1070 ) * attempt to also run e2e tests that needs gpus * fix stray quote * checkout specific github ref * dockerfile for tests with proper checkout ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support * pytest skip for auto-gptq requirements * skip mamba tests for now, split multipack and non packed lora llama tests * split tests that use monkeypatches * fix relative import for prev commit * move other tests using monkeypatches to the correct run	2024-01-09 21:23:23 -05:00
Casper	9be92d1448	Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` (#1077 ) * Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` * Fix code review	2024-01-09 23:39:25 +01:00
Wing Lian	d7057ccd36	paired kto support (#1069 )	2024-01-09 13:30:45 -05:00
mtenenholtz	768d348f42	update peft to 0.7.0 (#1073 )	2024-01-09 12:22:14 -05:00
Johan Hansson	090c24dcb0	Add: mlflow for experiment tracking (#1059 ) [skip ci] * Update requirements.txt adding mlflow * Update __init__.py Imports for mlflow * Update README.md * Create mlflow_.py (#1) * Update README.md * fix precommits * Update README.md Update mlflow_tracking_uri * Update trainer_builder.py update trainer building * chore: lint * make ternary a bit more readable --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:34:09 -05:00
Wing Lian	651b7a31fc	fix double eos token for chatml (#1054 ) [skip ci] * fix double eos token for chatml * isolate fix to chatml conversation * fix add special tokens to include rstrip * add test for train_on_inputs for sharegpt * don't use rstrip for chatml	2024-01-09 09:33:38 -05:00
Ricardo Dominguez-Olmedo	04b978b428	Cosine learning rate schedule - minimum learning rate (#1062 ) * Cosine min lr * Cosine min lr - warn if using deepspeed * cosine_min_lr_ratio readme * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:29:56 -05:00
NanoCode012	c3e8165f26	fix: torch_dtype mistral default to fp32 (#1050 )	2024-01-09 07:48:15 -05:00
Wing Lian	7f381750d9	Update FUNDING.yml for Kofi link (#1067 )	2024-01-08 19:26:51 -05:00
Wing Lian	14964417ee	Sponsors (#1065 ) * wip sponsors section in readme * add ko-fi and contributors list	2024-01-08 18:52:00 -05:00
Ricardo Dominguez-Olmedo	81d384598e	Efficiently get the length of the tokenized docs (#1063 ) * Efficiently get the length of the tokenized docs * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-08 15:48:30 -05:00
Wing Lian	732851f105	Phi2 rewrite (#1058 ) * restore to current phi modeling code from phi-2 * enable gradient checkpointing * don't cast everything to float32 all the time * gradient checkpointing for phi2 ParallelBlock module too * fix enabling flash attn for phi2 * add comment about import * fix phi2 example * fix model type check for tokenizer * revert float32 -> bf16 casting changes * support fused dense flash attn * fix the repo for flash-attn * add package name for subdir pkg * fix the data collator when not using sample packing * install packaging for pytests in ci * also fix setup to not install flash attn fused dense subdir if not extras * split out the fused-dense-lib in extra requires * don't train w group_by_length for phi * update integration test to use phi2 * set max steps and save steps for phi e2e tests * try to workaround ssave issue in ci * skip phi2 e2e test for now	2024-01-08 14:04:22 -05:00
Hamel Husain	9ca358b671	Simplify Docker Unit Test CI (#1055 ) [skip ci] * Update tests-docker.yml * Update tests-docker.yml * run ci tests on ci yaml updates --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-06 08:20:33 -05:00
JinK	553c80f79a	streaming multipack for pretraining dataset (#959 ) * [Feat] streaming multipack * WIP make continued pretraining work w multipack * fix up hadrcoding, lint * fix dict check * update test for updated pretraining multipack code * fix hardcoded data collator fix for multipack pretraining * fix the collator to be the max length for multipack pretraining * don't bother with latest tag for test * cleanup docker build/test --------- Co-authored-by: jinwonkim93@github.com <jinwonkim> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:13:21 -05:00
Hamel Husain	eb4c99431b	Update tests-docker.yml (#1052 ) [skip ci]	2024-01-05 14:26:18 -05:00
NanoCode012	cbdbf9e6e5	feat: always push checkpoint to hub if set (#1049 ) [skip ci]	2024-01-05 13:09:42 -05:00
kallewoof	bdfefaf054	feature: better device mapping for large models (#918 ) * fix: improved memory handling when model is bigger than existing VRAM * feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM) For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only. * doc: add README * fix: enable progress bars in do_merge_lora() * doc: mention gpu_memory_limit and lora_on_cpu in merge part of README * Update src/axolotl/utils/models.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * fix: remove deletion of removed model_kwargs key * fix: validate that gpu_memory_limit and max_memory are not both set --------- Co-authored-by: Karl-Johan Alm <kalle@gmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:22:21 +09:00
Hamel Husain	63fb3eb426	set default for merge (#1044 )	2024-01-04 18:14:20 -08:00
Hamel Husain	31d23504a5	fix model card upload for PEFT models (#1043 )	2024-01-04 18:13:54 -08:00
Wing Lian	f243c2186d	RL/DPO (#935 ) * ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs	2024-01-04 18:22:55 -05:00
xaviviro	59b2d302c8	Added chatglm3 conversation type for training models like TinyLLama (#1036 ) * Added chatgml3 conversation type for training models like TinyLLama * Added chatgml3 conversation type for training models like TinyLLama with lint * Added chatgml3 conversation type for training models like TinyLLama with lint	2024-01-04 21:03:04 +09:00
Wing Lian	bcc78d8fa3	bump transformers and update attention class map name (#1023 ) * bump transformers and update attention class map name * also run the tests in docker * add mixtral e2e smoke test * fix base name for docker image in test * mixtral lora doesn't seem to work, at least check qlora * add testcase for mixtral w sample packing * check monkeypatch for flash attn multipack * also run the e2e tests in docker * use all gpus to run tests in docker ci * use privileged mode too for docker w gpus * rename the docker e2e actions for gh ci * set privileged mode for docker and update mixtral model self attn check * use fp16/bf16 for mixtral w fa2 * skip e2e tests on docker w gpus for now * tests to validate mistral and mixtral patches * fix rel import	2024-01-03 12:11:04 -08:00
NanoCode012	74532ddc45	chore(config): clean up old log for Qwen (#1034 )	2024-01-04 01:19:52 +09:00
NanoCode012	8ba27f3bde	fix: lint (#1037 )	2024-01-03 10:23:44 -05:00
Hamel Husain	a3e8783328	[Docs] delete unused cfg value `lora_out_dir` (#1029 ) * Update README.md * Update README.md * Update README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-02 21:35:20 -08:00
NanoCode012	b31038aae9	chore(readme): update instruction to set config to load from cache (#1030 )	2024-01-03 11:56:19 +09:00
Tim Dolan	c75f916745	added tiny llama examples for lora and qlora (#1027 ) * added tiny llama examples for lora and qlora * corrected yml files and removed tiny-llama.yml from llama-2 example	2024-01-02 20:00:37 -05:00
Wing Lian	4d2e842e46	use recommended setting for use_reentrant w gradient checkpointing (#1021 ) * use recommended setting for use_reentrant w gradient checkpointing * add doc for gradient_checkpointing_kwargs	2024-01-01 22:17:27 -05:00
Tazik Shahjahan	3678a6c41d	Fix: bf16 support for inference (#981 ) * Fix: bf16 torch dtype * simplify casting to device and dtype --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-29 16:15:53 -06:00
mhenrichsen	f8ae59b0a8	Adds chat templates (#1022 )	2023-12-29 15:44:23 -06:00
Hamel Husain	4f4d638b84	[WandB] Push axolotl config to top level wandb files (#1014 )	2023-12-29 10:52:12 -08:00
Wing Lian	ba043a361e	add ultrachat prompt strategies (#996 )	2023-12-29 12:23:29 -06:00
NanoCode012	41353d2ea0	feat: expose bnb kwargs (#1018 ) * feat: expose bnb kwargs * chore: added examples and link per suggestion * Uncomment defaults per suggestion for readability Co-authored-by: Hamel Husain <hamel.husain@gmail.com> --------- Co-authored-by: Hamel Husain <hamel.husain@gmail.com>	2023-12-29 18:16:26 +09:00
NanoCode012	f6ecf14dd4	feat: remove need to add load_in* during merge (#1017 )	2023-12-29 18:15:30 +09:00
Hamel Husain	dec66d7c53	[Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013 )	2023-12-28 18:00:16 -08:00
Hamel Husain	76357dc5da	Update README.md (#1012 )	2023-12-28 18:00:02 -08:00
Wing Lian	70b46ca4f4	remove landmark attn and xpos rope implementations (#1010 )	2023-12-27 21:07:27 -08:00
Hamel Husain	85dd4d525b	add config to model card (#1005 ) * add config to model card * rm space * apply black formatting * apply black formatting * fix formatting * check for cfg attribute * add version * add version * put the config in a collapsible element * put the config in a collapsible element	2023-12-27 21:25:33 -06:00
Kevin Sydney	384b817dc0	Set eval_sample_packing to false in mistral config.yaml (#1003 ) Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.	2023-12-27 16:11:55 -08:00
Younes Belkada	db9094df0f	FEAT: add tagging support to axolotl (#1004 ) * add tagging support to axolotl * chore: lint * fix method w self --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-27 16:25:20 -06:00
Evan Griffiths	6ef46f8dca	Add an example config for finetuning a 34B model on a 24GB GPU (#1000 ) * Add an example config for finetuning a 34B model on a 24GB GPU * Remore wandb project	2023-12-25 10:29:55 -08:00
Wing Lian	628b754824	set output_router_logits for mixtral config: (#995 )	2023-12-22 12:57:02 -05:00
Wing Lian	37820f6540	support for cuda 12.1 (#989 )	2023-12-22 11:08:22 -05:00
NanoCode012	7d4185ffcb	chore: Update transformers to latest (#986 )	2023-12-23 00:29:36 +09:00
mhenrichsen	93ebec1ac5	change val size (#992 )	2023-12-22 16:18:16 +01:00

1 2 3 4 5 ...

1167 Commits