axolotl

Author	SHA1	Message	Date
Hamel Husain	85dd4d525b	add config to model card (#1005 ) * add config to model card * rm space * apply black formatting * apply black formatting * fix formatting * check for cfg attribute * add version * add version * put the config in a collapsible element * put the config in a collapsible element	2023-12-27 21:25:33 -06:00
Kevin Sydney	384b817dc0	Set eval_sample_packing to false in mistral config.yaml (#1003 ) Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.	2023-12-27 16:11:55 -08:00
Younes Belkada	db9094df0f	FEAT: add tagging support to axolotl (#1004 ) * add tagging support to axolotl * chore: lint * fix method w self --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-27 16:25:20 -06:00
Evan Griffiths	6ef46f8dca	Add an example config for finetuning a 34B model on a 24GB GPU (#1000 ) * Add an example config for finetuning a 34B model on a 24GB GPU * Remore wandb project	2023-12-25 10:29:55 -08:00
Wing Lian	628b754824	set output_router_logits for mixtral config: (#995 )	2023-12-22 12:57:02 -05:00
Wing Lian	37820f6540	support for cuda 12.1 (#989 )	2023-12-22 11:08:22 -05:00
NanoCode012	7d4185ffcb	chore: Update transformers to latest (#986 )	2023-12-23 00:29:36 +09:00
mhenrichsen	93ebec1ac5	change val size (#992 )	2023-12-22 16:18:16 +01:00
Hamel Husain	2e61dc3180	Add tests to Docker (#993 )	2023-12-22 06:37:20 -08:00
NanoCode012	1ffa3866f2	Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787 ) * Feat: Auto add to modules_to_save when adding tokens * fix: swap to error instead of warning * feat: add check when special_tokens differ and add test	2023-12-22 21:49:07 +09:00
Hamel Husain	62ba1609b6	bump actions versions	2023-12-21 08:54:08 -08:00
Hamel Husain	7bbaac98f7	fix mistral prompt assembly (#982 ) * fix mistral prompts * fix spacing * remove elif	2023-12-21 08:00:55 -08:00
Wing Lian	161bcb6517	Dockerfile torch fix (#987 ) * add torch to requirements.txt at build time to force version to stick * fix xformers check * better handling of xformers based on installed torch version * fix for ci w/o torch	2023-12-21 09:38:20 -05:00
Ikko Eltociear Ashimine	d25c34caa6	Update README.md (#966 )	2023-12-17 09:51:25 -05:00
NanoCode012	13e938149d	fix: add lr scheduler kwargs to Trainer (#972 )	2023-12-17 18:48:28 +09:00
Wing Lian	85de004dd4	fix for build for nccl in dockerfile (#970 )	2023-12-16 19:12:01 -05:00
Wing Lian	80ec7af358	update to latest nccl in docker image (#965 )	2023-12-16 18:31:25 -05:00
dumpmemory	f28e75513b	update transformers to fix checkpoint saving (#963 )	2023-12-15 21:03:17 -05:00
Hamel Husain	5ada140ff0	Fix prompt assembly for llama (#952 ) * start at index 0 * add test to check for missing turns * apply black * Update test_prompt_tokenizers.py * Update src/axolotl/monkeypatch/fastchat_conversation_turns.py Co-authored-by: Motoki Wu <tokestermw@gmail.com> * fix linting * apply black * add more tests for llama/sharegpt * make logic clearer --------- Co-authored-by: Motoki Wu <tokestermw@gmail.com>	2023-12-14 10:03:59 -08:00
Hamel Husain	712fd27b3f	Add docs (#947 ) * move section * update README * update README * update README * update README * update README * Update README.md Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-13 14:22:52 -08:00
kallewoof	ef24342538	fix: switch to using the HuggingFace Transformers NEFT implementation (#941 ) * fix: switch to using the HuggingFace Transformers NEFT implementation * linter * add support for noisy_embedding_alpha with a warning about it being renamed * restore pre/posttrain_hooks * move validation of NEFT noise alpha into validate_config() * linter	2023-12-13 17:15:34 -05:00
Wing Lian	5ea3aa31f0	Fix Deepspeed loading (#950 ) * add check for zero3 * freeze parameters * fixes for deepspeed loading * fix model parameter check * unfrozen parameters in example mixtral and logging when unfreezing	2023-12-13 16:03:23 -05:00
Wing Lian	f1f60cb5b2	Flash attn hotfix (#951 ) * use previous arg * use eager to use legacy attention that can be patched	2023-12-13 13:42:23 -05:00
kallewoof	450e04d3c4	fix: remove excessive newlines in system prompt(s) for alpaca (#936 )	2023-12-13 16:40:02 +09:00
Juraj Bednar	b0cf397ecb	More hints on what to do with CUDA Out of memory errors (#925 )	2023-12-13 16:38:38 +09:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
Hamel Husain	f1de29dd1e	Respect sequence_len in config for `type: llama2_chat` (#926 ) * Respect sequence_len in config for `type: llama2_chat` It was hardcoded to `4096` I am not sure why? This updates it to pull from the config. cc: @winglian * Update llama2_chat.py * apply black formatting * fix tokenizer * update test data * lint fixtures	2023-12-12 09:39:22 -08:00
Wing Lian	7fabc4d95e	Mixtral official (#942 ) * multipack support for official mixtral implementation * fix patch to load multipack for mixtral * chore: lint	2023-12-11 23:44:33 -05:00
Motoki Wu	9a5eb3990c	Update requirements.txt (#940 )	2023-12-11 22:57:28 -05:00
Casper	86487c2e96	Mixtral: More correct MoE, lower loss (#932 ) * More correct MoE * Fix formatting	2023-12-10 10:34:25 -05:00
Wing Lian	35f9b0f149	update to latest transformers for mixstral support (#929 ) * update to latest transformers for mixstral support * pin transformers * fix typo	2023-12-10 10:32:27 -05:00
Wing Lian	68b227a7d8	Mixtral multipack (#928 ) * mixtral multipack * use mixtral model * sample yml * calculate cu_seqlens properly * use updated flash ettention setting * attn var checks * force use of flash attention 2 for packing * lint * disable future fix for now * update support table	2023-12-09 21:26:30 -05:00
Timothy Lim	03c6318ba3	fixing prompt template of chatml by removal of linebreak (#922 ) Co-authored-by: Timothy Lim <timothyyonglee.lim@kxrdev.com>	2023-12-09 13:07:44 -05:00
Wing Lian	40a6362c92	support for mamba (#915 ) * support for mamba * more mamba fixes * use fork for mamba kwargs fix * grad checkpointing doesn't work * fix extras for mamaba * mamba loss fix * use fp32 and remove verbose logging * mamba fixes * fix collator for mamba * set model_type on training_args * don't save safetensors for mamba * update mamba config to disable safetensor checkpooints, install for tests * no evals for mamba tests * handle save_pretrained * handle unused safetensors arg	2023-12-09 12:10:41 -05:00
NanoCode012	d339beb9d9	chore: clarify Readme on sharegpt system role	2023-12-08 11:35:53 +09:00
NanoCode012	fde091cb12	fix(tokenizer): handle fast tokenizer properly for bos/eos (#914 )	2023-12-08 11:31:13 +09:00
Casper	06ae39200b	Pin flash-attn to 2.3.3 (#919 )	2023-12-07 07:36:52 +01:00
NanoCode012	a581e9f8f6	feat: add check for quantized model (#913 ) * feat: add check for quantized model * chore: refactor and add another check * Update src/axolotl/utils/models.py --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-05 01:20:06 +09:00
Bryan Thornbury	992e742cdc	Support device_map=sequential & max_memory config parameters (#903 ) * Support device_map sequential (and others). Support max_memory in cfg. * Update documentation in README accordingly. * Update README.md --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-04 09:29:21 -05:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
kallewoof	58ec8b1113	feature: loss watchdog for terminating training runs that are failing (#899 ) Co-authored-by: Karl-Johan Alm <kalle@gmail.com>	2023-12-04 07:54:34 -05:00
Haoxiang Wang	476a205cea	Remove learning rate scheduler in deepspeed config to avoid conflict (#909 )	2023-12-04 05:17:38 -05:00
Wing Lian	3e3229e2d9	fix for qwen w lora (#906 )	2023-11-30 12:45:50 -05:00
Wing Lian	1d21aa6b0a	ensure merged model matches the training dtype (#902 ) * ensure merged model matches the training dtype * Update src/axolotl/cli/__init__.py * Update src/axolotl/cli/__init__.py	2023-11-29 09:55:19 -05:00
kallewoof	71b7ea3c05	Determine FSDP/deepspeed settings on device select. (#883 ) * Determine FSDP/deepspeed settings on device select. Without this, the OS env check for accelerate will fail. * rename and move env setup call * chore: lint --------- Co-authored-by: Karl-Johan Alm <kalle@gmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-11-29 08:36:35 -05:00
NanoCode012	a48dbf6561	fix: remove FA for qwen examples (#900 ) * fix: remove FA for qwen lora * fix: remove FA for qlora	2023-11-27 21:23:54 +09:00
Wing Lian	6a4562ac08	update datasets version to cut down the warnings due to pyarrow arg change (#897 ) * update datasets to cut down the warnings * set versions for tokenizers and gradio * upgrade transformers to latest version	2023-11-25 16:30:00 -05:00
NanoCode012	1115c501b8	Feat: Add Qwen (#894 ) * Feat: Add Qwen * feat: add qwen lora example * feat: update matrix * fix: add trust_remote_code * fix: disable gradient checkpointing * chore: add warning about gradient checkpointing * fix: config * fix: turn off sample packing for this example and reduce seq len * chore: add comment on seq len	2023-11-26 00:05:01 +09:00
NanoCode012	7ee3c4cacb	fix: warning should not show if eval_batch_size not provided (#896 )	2023-11-25 16:04:00 +09:00
NanoCode012	fb12895a17	Feat: Add warmup_ratio (#893 ) * Feat: Add warmup_ratio * fix: update readme with more details on conflict	2023-11-25 12:15:43 +09:00

1 2 3 4 5 ...

1125 Commits