axolotl

Author	SHA1	Message	Date
Wing Lian	22810c97b7	use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci] * use warmup_ratio as a better default than warmup steps since it's data dependent * replace remainder of warmup_steps	2025-07-30 06:44:06 -04:00
Wing Lian	af8d257aa2	make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci] * make pad_to_sequence_len default to the same value as sample_packing * remove duplicate validation * fix test * update description meta Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-21 11:40:56 -04:00
Dan Saunders	10ba1622f7	checkpoint model on first step callback (#2906 ) * checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo	2025-07-15 15:00:48 -04:00
Wing Lian	dd8bad06d0	remove strict=false from example yamls [skip ci] (#2523 ) [skip ci]	2025-04-12 07:25:11 -07:00
Wing Lian	9f824ef76a	simplify the example configs to be more minimal and less daunting (#2486 ) [skip ci] * simplify the example configs to be more minimal and less daunting * drop empty s2_attention from example yamls	2025-04-04 13:47:26 -04:00
Sunny Liu	1c14c4a15c	Add hub model id config options to all example yml files (#2196 ) [skip ci] * added hub model_id in example yml * add hub model id to example yml	2024-12-17 11:24:30 -05:00
Wing Lian	4fde300e5f	update outputs path so that we can mount workspace to /workspace/data (#1623 ) * update outputs path so that we can mount workspace to /workspace/data * fix ln order	2024-05-15 12:44:13 -04:00
NanoCode012	a7a9a1433a	fix(examples): remove is_*_derived as it's parsed automatically (#1297 )	2024-02-22 00:52:46 +09:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
Wing Lian	f544ab2bed	don't compile deepspeed or bitsandbytes from source (#837 )	2023-11-08 19:49:55 -05:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
Wing Lian	2d8def68dc	simplify by removing duplicate base_model_config (#772 )	2023-10-23 01:42:38 -04:00
Wing Lian	e50a64e85e	prepared dataset caching, other misc fixes (#665 ) * prepared dataset caching, other misc fixes * also don't load from disk cache unless explicit	2023-10-02 21:07:24 -04:00
Wing Lian	d887ad86c3	eval_table isn't quite stable enough to be in default llama configs (#637 )	2023-09-26 10:13:20 -04:00
mhenrichsen	4fecbfe5e1	default model changed	2023-09-24 18:52:53 +02:00
Glavin Wiechert	5b67ea98a6	Add training callback to send predictions to WandB table (#521 ) * WIP Add training callback to send predictions to WandB table * WIP improve wandb table reporting callback * WIP improve wandb table reporting callback (cont) * Add VSCode launching for debugging * Add tiny llama example * WIP attempt to improve post-eval prediction generation for table * WIP attempt to improve post-eval prediction generation for table - part 2 * WIP batch generation * WIP attempt to handle sample_packing using position_ids for wandb prediction table * WIP add code for debugging * Fix sample_packing support for wandb prediction table * Clean up code for PR review * Add eval_table_size, eval_table_max_new_tokens configs & clean up code * Clean up PR, delete VSCode config, add tiny-llama example * Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting	2023-09-13 09:51:08 -04:00
Wing Lian	343714972b	recommend padding when using sample packing (#531 )	2023-09-06 17:00:21 -04:00
Wing Lian	1687be6a35	don't use mask expansion for inference (#392 )	2023-08-14 20:52:54 -04:00
mhenrichsen	fdffef5940	new llama-2 default settings (#370 ) * new default settings * fix whitespace * rm max packed sequence length --------- Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>	2023-08-14 17:39:09 +09:00
Morgan McGuire	7019509daa	Add wandb_entity to wandb options, update example configs, update README (#361 ) * Update wandb_entity and add wandb descriptions * add wandb to config section * remove trailing whitespace for pre-commit hook * remove trailing whitespace for pre-commit hook --------- Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-12 12:17:11 -04:00
Aman Karmani	36fefcf94b	set group_by_length to false in examples	2023-08-06 23:59:09 -07:00
mhenrichsen	dc71d8872a	feat/llama-2 examples (#319 ) * qlora llama-2 * qlora llama-2 * linting * readme * lora added * linting * change group_by_length * 13b fitting on 24gb * grouped lengths true * add pad token * change out dir --------- Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>	2023-08-03 19:22:48 +09:00

25 Commits