axolotl

Author	SHA1	Message	Date
Wing Lian	4d2e842e46	use recommended setting for use_reentrant w gradient checkpointing (#1021 ) * use recommended setting for use_reentrant w gradient checkpointing * add doc for gradient_checkpointing_kwargs	2024-01-01 22:17:27 -05:00
mhenrichsen	f8ae59b0a8	Adds chat templates (#1022 )	2023-12-29 15:44:23 -06:00
NanoCode012	41353d2ea0	feat: expose bnb kwargs (#1018 ) * feat: expose bnb kwargs * chore: added examples and link per suggestion * Uncomment defaults per suggestion for readability Co-authored-by: Hamel Husain <hamel.husain@gmail.com> --------- Co-authored-by: Hamel Husain <hamel.husain@gmail.com>	2023-12-29 18:16:26 +09:00
NanoCode012	f6ecf14dd4	feat: remove need to add load_in* during merge (#1017 )	2023-12-29 18:15:30 +09:00
Hamel Husain	dec66d7c53	[Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013 )	2023-12-28 18:00:16 -08:00
Hamel Husain	76357dc5da	Update README.md (#1012 )	2023-12-28 18:00:02 -08:00
Wing Lian	70b46ca4f4	remove landmark attn and xpos rope implementations (#1010 )	2023-12-27 21:07:27 -08:00
Ikko Eltociear Ashimine	d25c34caa6	Update README.md (#966 )	2023-12-17 09:51:25 -05:00
Hamel Husain	712fd27b3f	Add docs (#947 ) * move section * update README * update README * update README * update README * update README * Update README.md Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-13 14:22:52 -08:00
kallewoof	ef24342538	fix: switch to using the HuggingFace Transformers NEFT implementation (#941 ) * fix: switch to using the HuggingFace Transformers NEFT implementation * linter * add support for noisy_embedding_alpha with a warning about it being renamed * restore pre/posttrain_hooks * move validation of NEFT noise alpha into validate_config() * linter	2023-12-13 17:15:34 -05:00
Juraj Bednar	b0cf397ecb	More hints on what to do with CUDA Out of memory errors (#925 )	2023-12-13 16:38:38 +09:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
Wing Lian	68b227a7d8	Mixtral multipack (#928 ) * mixtral multipack * use mixtral model * sample yml * calculate cu_seqlens properly * use updated flash ettention setting * attn var checks * force use of flash attention 2 for packing * lint * disable future fix for now * update support table	2023-12-09 21:26:30 -05:00
NanoCode012	d339beb9d9	chore: clarify Readme on sharegpt system role	2023-12-08 11:35:53 +09:00
Bryan Thornbury	992e742cdc	Support device_map=sequential & max_memory config parameters (#903 ) * Support device_map sequential (and others). Support max_memory in cfg. * Update documentation in README accordingly. * Update README.md --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-12-04 09:29:21 -05:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
kallewoof	58ec8b1113	feature: loss watchdog for terminating training runs that are failing (#899 ) Co-authored-by: Karl-Johan Alm <kalle@gmail.com>	2023-12-04 07:54:34 -05:00
NanoCode012	1115c501b8	Feat: Add Qwen (#894 ) * Feat: Add Qwen * feat: add qwen lora example * feat: update matrix * fix: add trust_remote_code * fix: disable gradient checkpointing * chore: add warning about gradient checkpointing * fix: config * fix: turn off sample packing for this example and reduce seq len * chore: add comment on seq len	2023-11-26 00:05:01 +09:00
NanoCode012	fb12895a17	Feat: Add warmup_ratio (#893 ) * Feat: Add warmup_ratio * fix: update readme with more details on conflict	2023-11-25 12:15:43 +09:00
NanoCode012	9fc29e082b	chore(doc): Add info on changing role in sharegpt (#886 )	2023-11-22 15:32:50 +09:00
Mark Saroufim	ddf815022a	Install from git url (#874 ) * Install from git url * Update README.md	2023-11-17 12:50:51 -05:00
Wing Lian	0de1457189	try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867 ) * isolate torch from the requirements.txt * fix typo for removed line ending * pin transformers and accelerate to latest releases * try w auto-gptq==0.5.1 * update README to remove manual peft install * pin xformers to 0.0.22 * bump flash-attn to 2.3.3 * pin flash attn to exact version	2023-11-16 10:42:36 -05:00
NanoCode012	3cc67d2cdd	Feat: Add dataset loading from S3, GCS (#765 ) * Feat: Add dataset loading from S3, GCS * chore: update docs * chore: add more info on cloud loading	2023-11-16 14:33:58 +09:00
Wing Lian	1bc11868eb	allow overriding of model_config parameters from the YML (#853 ) * allow overriding of model_config parameters from the YML * remove old logging, update readme * move the updating of model config to the load_model_config function * add warning for deprecated rope_scaling in the root of the YML config	2023-11-15 23:47:08 -05:00
Wing Lian	8a8d1c4023	make docker command more robust (#861 ) * make docker command more robust * update readme with more info	2023-11-15 23:03:54 -05:00
Wing Lian	332984db18	lint fix that didn't get caught by linter (#866 )	2023-11-15 14:36:40 -05:00
Zongheng Yang	b33c1d55a2	Docs: add instructions to 1-click launching on public clouds (#862 ) * Update README.md * Update ToC	2023-11-15 14:11:27 -05:00
NanoCode012	501b4d1379	chore(doc): Separate section on runpod (#860 )	2023-11-16 01:06:51 +09:00
NanoCode012	306fe19c54	feat(doc): add more info on train_on_split (#855 )	2023-11-15 23:42:26 +09:00
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
Wing Lian	cdc71f73c8	update table for rwkv4 support, fix process count for dataset (#822 )	2023-11-04 23:45:44 -04:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
Aleksa Gordić	2e71ff03a6	Add docker advanced instruction to README (#792 )	2023-10-27 09:24:04 -04:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
NanoCode012	20aa4b57d2	chore(readme): Improve documentation on conversation field (#782 ) * chore(readme): Improve documentation on conversation field * fix: clarify where the option is	2023-10-24 12:52:32 +09:00
NanoCode012	afedc470bd	Fix: Cannot tokenize with bf16 and on cpu (#766 )	2023-10-23 01:32:26 +09:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Wing Lian	a21935f07a	add to docs (#703 )	2023-10-19 21:32:30 -04:00
Casper	e1b214c62b	Clarify custom format example (#729 ) * Clarify custom prompt format * Simplify format	2023-10-14 09:28:12 -04:00
Maxime	3bd9528390	add noisy embedding (#721 ) * add noisy embedding * fix format * Update README.md * Update README.md * linter issues * caseus fixes --------- Co-authored-by: Maxime <maxime@nope.no>	2023-10-13 10:00:42 -04:00
NanoCode012	5855dded3d	fix(doc): update default doc according to arg (#714 )	2023-10-10 21:51:56 +09:00
NanoCode012	11c48c5e03	fix(doc): Add note on inference w sample packing (#712 )	2023-10-10 21:08:17 +09:00
seungduk.kim.2304	77c84e02fd	Update README with some explanations (#700 ) * Update README with some explanations * revert commit-hook change * add more explanation about batch size and gradient accum * not use latex foromat * decorate * git hook again * Attach a link that explains about LoRA hyperparameters * update table of content * Explanation about lora_modules_to_save	2023-10-08 13:37:54 -04:00
Wing Lian	2642caedf2	refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662 )	2023-10-02 21:08:07 -04:00
Kyle Corbitt	9ec20777ba	Make dataset_processes configurable (#651 ) I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies. This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.	2023-09-29 00:22:22 -04:00
ich	590d6032fd	Fix bug when using pretokenized datasets (#652 ) * fix pretokenized datasets readme * check if dataset type is not set to handle pretokenized datasets	2023-09-28 22:54:10 -04:00
Wing Lian	409ca0f21c	add support for defined train split (#654 )	2023-09-28 20:14:14 -04:00
NanoCode012	eb41f76f92	Feat: Add example for Mistral (#644 ) * Feat: Add example for Mistral * chore: turn off flash * chore: add is_mistral_derived_model * chore: update following PR	2023-09-28 20:15:00 +09:00
Napuh	85b0be2ba7	Warn users to login to HuggingFace (#645 ) * added warning if user is not logged in HF * updated doc to suggest logging in to HF	2023-09-27 17:43:35 -04:00
Wing Lian	895f0a0723	skip some flash attn patches unless explicitly enabled (#643 ) * skip some flash attn patches if explicitly disabled * make the other patches optional	2023-09-27 12:11:07 -04:00

1 2 3 4 5

237 Commits