axolotl

Author	SHA1	Message	Date
sunny	1f09f48d8f	fixed small typo	2024-08-16 11:27:56 -04:00
sunny	b44546df6f	fixedtypo	2024-08-16 11:22:50 -04:00
Sunny	967fbf8152	fixedtypo	2024-08-16 11:17:52 -04:00
Sunny	c144a1ae65	fixed small typo	2024-08-16 10:57:42 -04:00
NanoCode012	68a3c7678a	fix: parse model_kwargs (#1825 )	2024-08-16 07:51:19 -04:00
NanoCode012	f18925fb4b	fix: parse eager_attention (#1824 )	2024-08-14 09:46:46 -04:00
Wing Lian	1853d6021d	bump hf dependencies (#1823 ) * bump hf dependencies * revert optimum version change * don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that	2024-08-11 16:27:41 -04:00
Chiwan Park	0801f239cc	fix the incorrect `max_length` for chat template (#1818 )	2024-08-09 11:50:31 -04:00
Wing Lian	54392ac8a6	Attempt to run multigpu in PR CI for now to ensure it works (#1815 ) [skip ci] * Attempt to run multigpu in PR CI for now to ensure it works * fix yaml file * forgot to include multigpu tests * fix call to cicd.multigpu * dump dictdefault to dict for yaml conversion * use to_dict instead of casting * 16bit-lora w flash attention, 8bit lora seems problematic * add llama fsdp test * more tests * Add test for qlora + fsdp with prequant * limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test * move multigpu tests to biweekly	2024-08-09 11:50:13 -04:00
Wing Lian	3e2b269d06	update tinyllama to use final instead of checkpoints (#1820 ) [skip ci]	2024-08-09 10:58:19 -04:00
Wing Lian	5ee4b7325f	fix z3 leaf configuration when not using lists (#1817 ) [skip ci]	2024-08-09 10:54:52 -04:00
Wing Lian	70978467a0	skip no commit to main on ci (#1814 )	2024-08-06 15:25:54 -04:00
Wing Lian	850f999a76	update peft and transformers (#1811 )	2024-08-06 10:32:05 -04:00
Wing Lian	c56e0a79a5	logging improvements (#1808 ) [skip ci] * logging improvements * fix sort	2024-08-06 10:31:50 -04:00
Wing Lian	35d5e59d78	set z3 leaf for deepseek v2 (#1809 ) [skip ci] * set z3 leaf for deepseek v2 * add deepseek v2 chat template	2024-08-06 09:30:46 -04:00
Wing Lian	fbbeb4fee0	remove un-necessary zero-first guard as it's already only called in a parent fn (#1810 ) [skip ci]	2024-08-06 09:29:23 -04:00
Wing Lian	ecdda006de	One cycle lr (#1803 ) * refactor one_cycle lr scheduler so it's reusable in more situations * fix validation for lr_scheduler * default to cosine anneal strategy * one cycle lr exepects cos	2024-08-05 13:12:05 -04:00
Ben Feuer	b7665c26c8	Update conversation.qmd (#1788 ) [skip ci]	2024-08-05 12:44:26 -04:00
Aaditya Ura (looking for PhD Fall’24)	cb023c70db	Update instruct-lora-8b.yml (#1789 ) [skip ci] Config is giving an error if not using the end of the token as the `pad_to_sequence_len` is true.	2024-08-05 12:43:20 -04:00
ripes	7402eb9dcb	Fix setting correct repo id when pushing dataset to hub (#1657 ) * use the ds hash as the dataset's config_name * improve logging for loading/pushing ds to hub * fix missing f string	2024-08-05 12:42:15 -04:00
Sri Kainkaryam	203816f7b4	Fix colab example notebook (#1805 ) [skip ci]	2024-08-04 13:24:26 -04:00
Wing Lian	78b42a3fe1	fix roles to train defaults and make logging less verbose (#1801 )	2024-07-30 20:58:17 -04:00
Wing Lian	3ebf22464b	qlora-fsdp ram efficient loading with hf trainer (#1791 ) * fix 405b with lower cpu ram requirements * make sure to use doouble quant and only skip output embeddings * set model attributes * more fixes for sharded fsdp loading * update the base model in example to use pre-quantized nf4-bf16 weights * upstream fixes for qlora+fsdp	2024-07-30 19:21:38 -04:00
Wing Lian	dbf8fb549e	publish axolotl images without extras in the tag name (#1798 )	2024-07-30 13:36:19 -04:00
Wing Lian	9a63884597	update test and main/nightly builds (#1797 ) * update test and main/nightly builds * don't install mamba-ssm on 2.4.0 since it has no wheels yet	2024-07-30 12:37:40 -04:00
Wing Lian	c5587b45ac	use 12.4.1 instead of 12.4 [skip-ci] (#1796 )	2024-07-30 08:50:23 -04:00
Wing Lian	d4f6a6b103	fix dockerfile and base builder (#1795 ) [skip-ci]	2024-07-30 08:34:37 -04:00
Wing Lian	d8d1788ffc	move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 (#1793 )	2024-07-30 08:06:11 -04:00
mhenrichsen	3bc8e64557	Update README.md (#1792 )	2024-07-30 07:59:53 +02:00
Adam Brusselback	55cc214c76	Add flexible configuration options for `chat_template` dataset training (#1756 ) * Add flexible configuration options for chat dataset training - Introduce roles_to_train parameter to set training labels by role - Add train_on_eos option to configure training on end-of-sequence tokens - Implement per-message training configuration in dataset - Allow fine-grained control over training specific portions of messages - Add message_field_training and message_field_training_detail settings - Implement mapping between dataset character offsets and tokenized prompt - Enhance test suite to cover new functionality * Fix missing field inits, things weren't working from yaml. * Add flexible configuration options for chat dataset training - Introduce roles_to_train parameter to set training labels by role - Add train_on_eos option to configure training on end-of-sequence tokens - Implement per-message training configuration in dataset - Allow fine-grained control over training specific portions of messages - Add message_field_training and message_field_training_detail settings - Implement mapping between dataset character offsets and tokenized prompt - Enhance test suite to cover new functionality * Fix missing field inits, things weren't working from yaml. * chore: lint * Revert test repo back to NousResearch after opening PR to fix the tokenizer_config.json. --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-07-28 21:48:57 -04:00
Wing Lian	94ba93259f	various batch of fixes (#1785 ) * various batch of fixes * more tweaks * fix autoawq requirement for torch flexibility * simplify conditionals * multi-node fixes wip * bump transformers and include 405b qlora+fsdp yaml	2024-07-28 07:25:54 -04:00
Wing Lian	22680913f3	Bump deepspeed 20240727 (#1790 ) * pin deepspeed to 0.14.4 otherwise it doesn't play nice with trl * Add test to import to try to trigger import dependencies	2024-07-27 10:24:11 -04:00
Wing Lian	6a9cfec222	add support for simpo via cpo trainer (#1772 ) * add support for simpo via cpo trainer * add cpo_alpha / sft_weight from the paper * make sure to use the right builder for simpo	2024-07-23 21:22:16 -04:00
Wing Lian	fe250ada78	fix fsdp loading of models, esp 70b (#1780 )	2024-07-23 19:54:28 -04:00
Wing Lian	e6b299dd79	bump flash attention to 2.6.2 (#1781 ) [skip ci]	2024-07-23 19:54:15 -04:00
Wing Lian	608a2f3180	bump transformers for updated llama 3.1 (#1778 ) * bump transformers for updated llama 3.1 * bump for patch fix	2024-07-23 13:21:03 -04:00
Wing Lian	87455e7f32	swaps to use newer sample packing for mistral (#1773 ) * swaps to use newer sample packing for mistral * fix multipack patch test * patch the common fa utils * update for refactor of flash attn unpad * remove un-needed drop attn mask for mistral * bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2 * update test	2024-07-23 01:41:11 -04:00
Keith Stevens	985819d89b	Add a `chat_template` prompt strategy for DPO (#1725 ) * Implementing a basic chat_template strategy for DPO datasets This mimics the sft chat_template strategy such that users can: * Specify the messages field * Specify the per message role and content fields * speicfy the chosen and rejected fields * Let the tokenizer construct the raw prompt * Ensure the chosen and rejected fields don't have any prefix tokens * Adding additional dpo chat template unittests * Rename test class	2024-07-21 09:10:42 -04:00
Wing Lian	fa91b698e9	Fix untrained tokens (#1771 ) * fix untrained reserved tokens * save model after fixing untrained embeddings * don't need fsdp conditional here	2024-07-19 12:21:37 -04:00
Wing Lian	e4063d60a7	bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769 ) * bump transformers and set roundup_power2_divisions for more VRAM improvements * support for low bit optimizers from torch ao * fix check for alternate optimizers and use nous models on hf for llama3 * add missing check for ao_adamw_fp8 * fix check when using custom optimizers w adamw	2024-07-19 00:47:07 -04:00
Wing Lian	7830fe04b5	Unsloth rope (#1767 ) * Add unsloth rope embeddings support * support for models weights in 4bit and do some memory gc * use accelerate logger * add unsloth llama rms norm optims * update docs for unsloth * more docs info	2024-07-18 14:54:41 -04:00
Wing Lian	c86c32a627	set the number of dataset processes on the DPO Config rather than the trainer (#1762 )	2024-07-17 15:38:37 -04:00
Wing Lian	8731b95d04	re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments (#1765 ) [skip ci]	2024-07-17 15:38:26 -04:00
Wing Lian	8619b2d855	add torch_compile_mode options (#1763 ) [skip ci] * add torch_compile_mode options * make sure n_gpu is an int	2024-07-17 15:38:07 -04:00
Wing Lian	976f85195a	fixes to accelerator so that iterable pretraining datasets work (#1759 ) * fixes to accelerator so that iterable pretraining datasets work * fix the pretraining test params * split batches, not dispatch batches needs to be set * update c4 datasets * set epochs in pretrain config test * need to set both split_batches and dispatch_batches to false for pretraining * fix bool val in comment	2024-07-17 10:58:38 -04:00
Wing Lian	152ab76623	fix num gpu check (#1760 )	2024-07-17 10:58:14 -04:00
Wing Lian	5f58555bd0	support for llama multipack using updated code/patches (#1754 ) * support for llama multipack using updated code/patches * also support unsloth patches * incorrect arg * add config validation for unsloth * add missing return to validation * add another missing return to validation	2024-07-16 17:36:29 -04:00
Wing Lian	cfc533a7f7	torch compile and cuda alloc improvements (#1755 ) * enable experimental expandable_segments * hf trainer seems to be missing torch compile * disable PYTORCH_CUDA_ALLOC_CONF to see if that fixes cicd	2024-07-16 16:00:23 -04:00
Wing Lian	e1725aef2b	update modal package and don't cache pip install (#1757 ) * update modal package and cleanup pip cache * more verbosity on the test	2024-07-16 14:45:38 -04:00
Wing Lian	78e12f8ca5	add basic support for the optimi adamw optimizer (#1727 ) * add support for optimi_adamw optimizer w kahan summation * pydantic validator for optimi_adamw * workaround for setting optimizer for fsdp * make sure to install optimizer packages * make sure to have parity for model parameters passed to optimizer * add smoke test for optimi_adamw optimizer * don't use foreach optimi by default	2024-07-14 19:12:57 -04:00

1 2 3 4 5 ...

1556 Commits