axolotl

Author	SHA1	Message	Date
NanoCode012	9cd27b2f91	fix(readme): clarify custom user prompt [no-ci] (#1124 ) * fix(readme): clarify custom user prompt * chore: update example to show use case of setting field	2024-01-16 09:47:33 +09:00
Hamel Husain	2dc431078c	Add link on README to Docker Debugging (#1107 ) * add docker debug * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * explain editable install * explain editable install * upload new video * add link to README * Update README.md * Update README.md * chore: lint * make sure to lint markdown too --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-12 08:51:35 -05:00
Hamel Husain	b502392e82	Update README.md (#1103 ) * Update README.md * Update README.md	2024-01-11 16:41:58 -08:00
Hamel Husain	7512c3ad20	Add Debugging Guide (#1089 ) * add debug guide * add background * add .gitignore * Update devtools/dev_sharegpt.yml Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * simplify example axolotl config * add additional comments * add video and TOC * try jsonc for better md rendering * style video thumbnail better * fix footnote --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-10 20:49:24 -08:00
Wing Lian	d7057ccd36	paired kto support (#1069 )	2024-01-09 13:30:45 -05:00
Johan Hansson	090c24dcb0	Add: mlflow for experiment tracking (#1059 ) [skip ci] * Update requirements.txt adding mlflow * Update __init__.py Imports for mlflow * Update README.md * Create mlflow_.py (#1) * Update README.md * fix precommits * Update README.md Update mlflow_tracking_uri * Update trainer_builder.py update trainer building * chore: lint * make ternary a bit more readable --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:34:09 -05:00
Ricardo Dominguez-Olmedo	04b978b428	Cosine learning rate schedule - minimum learning rate (#1062 ) * Cosine min lr * Cosine min lr - warn if using deepspeed * cosine_min_lr_ratio readme * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:29:56 -05:00
Wing Lian	14964417ee	Sponsors (#1065 ) * wip sponsors section in readme * add ko-fi and contributors list	2024-01-08 18:52:00 -05:00
kallewoof	bdfefaf054	feature: better device mapping for large models (#918 ) * fix: improved memory handling when model is bigger than existing VRAM * feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM) For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only. * doc: add README * fix: enable progress bars in do_merge_lora() * doc: mention gpu_memory_limit and lora_on_cpu in merge part of README * Update src/axolotl/utils/models.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * fix: remove deletion of removed model_kwargs key * fix: validate that gpu_memory_limit and max_memory are not both set --------- Co-authored-by: Karl-Johan Alm <kalle@gmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:22:21 +09:00
Hamel Husain	63fb3eb426	set default for merge (#1044 )	2024-01-04 18:14:20 -08:00
Hamel Husain	a3e8783328	[Docs] delete unused cfg value `lora_out_dir` (#1029 ) * Update README.md * Update README.md * Update README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-02 21:35:20 -08:00
NanoCode012	b31038aae9	chore(readme): update instruction to set config to load from cache (#1030 )	2024-01-03 11:56:19 +09:00
Wing Lian	4d2e842e46	use recommended setting for use_reentrant w gradient checkpointing (#1021 ) * use recommended setting for use_reentrant w gradient checkpointing * add doc for gradient_checkpointing_kwargs	2024-01-01 22:17:27 -05:00
mhenrichsen	f8ae59b0a8	Adds chat templates (#1022 )	2023-12-29 15:44:23 -06:00
NanoCode012	41353d2ea0	feat: expose bnb kwargs (#1018 ) * feat: expose bnb kwargs * chore: added examples and link per suggestion * Uncomment defaults per suggestion for readability Co-authored-by: Hamel Husain <hamel.husain@gmail.com> --------- Co-authored-by: Hamel Husain <hamel.husain@gmail.com>	2023-12-29 18:16:26 +09:00
NanoCode012	f6ecf14dd4	feat: remove need to add load_in* during merge (#1017 )	2023-12-29 18:15:30 +09:00
Hamel Husain	dec66d7c53	[Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013 )	2023-12-28 18:00:16 -08:00
Hamel Husain	76357dc5da	Update README.md (#1012 )	2023-12-28 18:00:02 -08:00
Wing Lian	70b46ca4f4	remove landmark attn and xpos rope implementations (#1010 )	2023-12-27 21:07:27 -08:00
Ikko Eltociear Ashimine	d25c34caa6	Update README.md (#966 )	2023-12-17 09:51:25 -05:00
Hamel Husain	712fd27b3f	Add docs (#947 ) * move section * update README * update README * update README * update README * update README * Update README.md Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-13 14:22:52 -08:00
kallewoof	ef24342538	fix: switch to using the HuggingFace Transformers NEFT implementation (#941 ) * fix: switch to using the HuggingFace Transformers NEFT implementation * linter * add support for noisy_embedding_alpha with a warning about it being renamed * restore pre/posttrain_hooks * move validation of NEFT noise alpha into validate_config() * linter	2023-12-13 17:15:34 -05:00
Juraj Bednar	b0cf397ecb	More hints on what to do with CUDA Out of memory errors (#925 )	2023-12-13 16:38:38 +09:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
Wing Lian	68b227a7d8	Mixtral multipack (#928 ) * mixtral multipack * use mixtral model * sample yml * calculate cu_seqlens properly * use updated flash ettention setting * attn var checks * force use of flash attention 2 for packing * lint * disable future fix for now * update support table	2023-12-09 21:26:30 -05:00
NanoCode012	d339beb9d9	chore: clarify Readme on sharegpt system role	2023-12-08 11:35:53 +09:00
Bryan Thornbury	992e742cdc	Support device_map=sequential & max_memory config parameters (#903 ) * Support device_map sequential (and others). Support max_memory in cfg. * Update documentation in README accordingly. * Update README.md --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-04 09:29:21 -05:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
kallewoof	58ec8b1113	feature: loss watchdog for terminating training runs that are failing (#899 ) Co-authored-by: Karl-Johan Alm <kalle@gmail.com>	2023-12-04 07:54:34 -05:00
NanoCode012	1115c501b8	Feat: Add Qwen (#894 ) * Feat: Add Qwen * feat: add qwen lora example * feat: update matrix * fix: add trust_remote_code * fix: disable gradient checkpointing * chore: add warning about gradient checkpointing * fix: config * fix: turn off sample packing for this example and reduce seq len * chore: add comment on seq len	2023-11-26 00:05:01 +09:00
NanoCode012	fb12895a17	Feat: Add warmup_ratio (#893 ) * Feat: Add warmup_ratio * fix: update readme with more details on conflict	2023-11-25 12:15:43 +09:00
NanoCode012	9fc29e082b	chore(doc): Add info on changing role in sharegpt (#886 )	2023-11-22 15:32:50 +09:00
Mark Saroufim	ddf815022a	Install from git url (#874 ) * Install from git url * Update README.md	2023-11-17 12:50:51 -05:00
Wing Lian	0de1457189	try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867 ) * isolate torch from the requirements.txt * fix typo for removed line ending * pin transformers and accelerate to latest releases * try w auto-gptq==0.5.1 * update README to remove manual peft install * pin xformers to 0.0.22 * bump flash-attn to 2.3.3 * pin flash attn to exact version	2023-11-16 10:42:36 -05:00
NanoCode012	3cc67d2cdd	Feat: Add dataset loading from S3, GCS (#765 ) * Feat: Add dataset loading from S3, GCS * chore: update docs * chore: add more info on cloud loading	2023-11-16 14:33:58 +09:00
Wing Lian	1bc11868eb	allow overriding of model_config parameters from the YML (#853 ) * allow overriding of model_config parameters from the YML * remove old logging, update readme * move the updating of model config to the load_model_config function * add warning for deprecated rope_scaling in the root of the YML config	2023-11-15 23:47:08 -05:00
Wing Lian	8a8d1c4023	make docker command more robust (#861 ) * make docker command more robust * update readme with more info	2023-11-15 23:03:54 -05:00
Wing Lian	332984db18	lint fix that didn't get caught by linter (#866 )	2023-11-15 14:36:40 -05:00
Zongheng Yang	b33c1d55a2	Docs: add instructions to 1-click launching on public clouds (#862 ) * Update README.md * Update ToC	2023-11-15 14:11:27 -05:00
NanoCode012	501b4d1379	chore(doc): Separate section on runpod (#860 )	2023-11-16 01:06:51 +09:00
NanoCode012	306fe19c54	feat(doc): add more info on train_on_split (#855 )	2023-11-15 23:42:26 +09:00
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
Wing Lian	cdc71f73c8	update table for rwkv4 support, fix process count for dataset (#822 )	2023-11-04 23:45:44 -04:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
Aleksa Gordić	2e71ff03a6	Add docker advanced instruction to README (#792 )	2023-10-27 09:24:04 -04:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
NanoCode012	20aa4b57d2	chore(readme): Improve documentation on conversation field (#782 ) * chore(readme): Improve documentation on conversation field * fix: clarify where the option is	2023-10-24 12:52:32 +09:00
NanoCode012	afedc470bd	Fix: Cannot tokenize with bf16 and on cpu (#766 )	2023-10-23 01:32:26 +09:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Wing Lian	a21935f07a	add to docs (#703 )	2023-10-19 21:32:30 -04:00

1 2 3 4 5

249 Commits