axolotl

Author	SHA1	Message	Date
Chiwan Park	2dac1edf72	Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` (#1867 )	2024-08-26 12:56:12 -04:00
Wing Lian	6819c12cee	update specturm authors (#1869 )	2024-08-26 12:00:36 -04:00
Wing Lian	8e29bdefdd	Spectrum plugin (#1866 )	2024-08-25 17:54:02 -04:00
Wing Lian	f245964f22	better handling of llama-3 tool rolw (#1782 )	2024-08-25 12:31:40 -04:00
Wing Lian	22f4eafa55	simplify logic (#1856 )	2024-08-23 20:23:08 -04:00
Wing Lian	77a4b9cda2	change up import to prevent AttributeError (#1863 ) * change up import to prevent AttributeError * tweak patching check for updated upstream	2024-08-23 17:00:01 -04:00
Wing Lian	1f686c576c	Liger Kernel integration (#1861 ) * add initial plugin support w Liger kernel patches * integrate the input args classes * fix liger plugin and dynamic configuration class * drop untrainable samples and refactor config plugins integration * fix incorrect inputs and circular imports * fix bool comparison * fix for dropping untraibable tokens * fix licensing so liger integration is Apache 2.0 * add jamba support * pylint ignore	2024-08-23 12:21:51 -04:00
Wing Lian	328fd4b3b7	add axolotl community license (#1862 )	2024-08-23 11:40:21 -04:00
Wing Lian	fefa95e350	most model types now support flash attention 2 regardless of multipack support (#1854 )	2024-08-22 16:39:23 -04:00
Wing Lian	2f8037fee6	ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850 ) [skip ci]	2024-08-22 13:10:40 -04:00
JohanWork	7ed92e61c2	fix: prompt phi (#1845 ) [skip ci] * corecting phi system prompt * phi test * update * add test	2024-08-22 11:46:57 -04:00
Wing Lian	9caa3eb699	make the train_on_eos default to turn so all eos tokens are treated the same (#1847 ) [skip ci]	2024-08-22 11:45:37 -04:00
Wing Lian	5b0b774e38	ensure that the bias is also in the correct dtype (#1848 ) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp	2024-08-22 11:45:00 -04:00
Gal Cohen (galco)	9f917245f6	feat: add jamba chat_template (#1843 ) * feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-21 13:37:17 -04:00
Aman Gupta Karmani	649c19aba3	pretrain: fix with sample_packing=false (#1841 )	2024-08-21 13:36:51 -04:00
Gal Cohen (galco)	5aac4bc284	fix: dont change quant storage dtype in case of fsdp (#1837 ) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 12:41:48 -04:00
Wing Lian	e29931259b	optionally save the final FSDP model as a sharded state dict (#1828 ) * efficiently save very large llms when using FSDP * fix parsing and index of sharded chunks * only save fsdp on main process * debugging for rename * save sharded state dict * remove unused new param * get state dict directly * tweak acc merge fsdp to shard the weight files * sharded_state_dict alongside save_safetensors seems to hang on checkpoint save	2024-08-19 14:59:24 -04:00
Wing Lian	b1d2921222	add validation to prevent 8bit lora finetuning on H100s (#1827 )	2024-08-16 21:32:00 -04:00
Wing Lian	803fed3e90	update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821 ) * update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model * There is already a condition check within the function. This outer one is not necessary Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-08-16 10:41:51 -04:00
NanoCode012	68a3c7678a	fix: parse model_kwargs (#1825 )	2024-08-16 07:51:19 -04:00
NanoCode012	f18925fb4b	fix: parse eager_attention (#1824 )	2024-08-14 09:46:46 -04:00
Chiwan Park	0801f239cc	fix the incorrect `max_length` for chat template (#1818 )	2024-08-09 11:50:31 -04:00
Wing Lian	5ee4b7325f	fix z3 leaf configuration when not using lists (#1817 ) [skip ci]	2024-08-09 10:54:52 -04:00
Wing Lian	c56e0a79a5	logging improvements (#1808 ) [skip ci] * logging improvements * fix sort	2024-08-06 10:31:50 -04:00
Wing Lian	35d5e59d78	set z3 leaf for deepseek v2 (#1809 ) [skip ci] * set z3 leaf for deepseek v2 * add deepseek v2 chat template	2024-08-06 09:30:46 -04:00
Wing Lian	fbbeb4fee0	remove un-necessary zero-first guard as it's already only called in a parent fn (#1810 ) [skip ci]	2024-08-06 09:29:23 -04:00
Wing Lian	ecdda006de	One cycle lr (#1803 ) * refactor one_cycle lr scheduler so it's reusable in more situations * fix validation for lr_scheduler * default to cosine anneal strategy * one cycle lr exepects cos	2024-08-05 13:12:05 -04:00
ripes	7402eb9dcb	Fix setting correct repo id when pushing dataset to hub (#1657 ) * use the ds hash as the dataset's config_name * improve logging for loading/pushing ds to hub * fix missing f string	2024-08-05 12:42:15 -04:00
Wing Lian	78b42a3fe1	fix roles to train defaults and make logging less verbose (#1801 )	2024-07-30 20:58:17 -04:00
Wing Lian	3ebf22464b	qlora-fsdp ram efficient loading with hf trainer (#1791 ) * fix 405b with lower cpu ram requirements * make sure to use doouble quant and only skip output embeddings * set model attributes * more fixes for sharded fsdp loading * update the base model in example to use pre-quantized nf4-bf16 weights * upstream fixes for qlora+fsdp	2024-07-30 19:21:38 -04:00
Adam Brusselback	55cc214c76	Add flexible configuration options for `chat_template` dataset training (#1756 ) * Add flexible configuration options for chat dataset training - Introduce roles_to_train parameter to set training labels by role - Add train_on_eos option to configure training on end-of-sequence tokens - Implement per-message training configuration in dataset - Allow fine-grained control over training specific portions of messages - Add message_field_training and message_field_training_detail settings - Implement mapping between dataset character offsets and tokenized prompt - Enhance test suite to cover new functionality * Fix missing field inits, things weren't working from yaml. * Add flexible configuration options for chat dataset training - Introduce roles_to_train parameter to set training labels by role - Add train_on_eos option to configure training on end-of-sequence tokens - Implement per-message training configuration in dataset - Allow fine-grained control over training specific portions of messages - Add message_field_training and message_field_training_detail settings - Implement mapping between dataset character offsets and tokenized prompt - Enhance test suite to cover new functionality * Fix missing field inits, things weren't working from yaml. * chore: lint * Revert test repo back to NousResearch after opening PR to fix the tokenizer_config.json. --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-07-28 21:48:57 -04:00
Wing Lian	94ba93259f	various batch of fixes (#1785 ) * various batch of fixes * more tweaks * fix autoawq requirement for torch flexibility * simplify conditionals * multi-node fixes wip * bump transformers and include 405b qlora+fsdp yaml	2024-07-28 07:25:54 -04:00
Wing Lian	6a9cfec222	add support for simpo via cpo trainer (#1772 ) * add support for simpo via cpo trainer * add cpo_alpha / sft_weight from the paper * make sure to use the right builder for simpo	2024-07-23 21:22:16 -04:00
Wing Lian	fe250ada78	fix fsdp loading of models, esp 70b (#1780 )	2024-07-23 19:54:28 -04:00
Wing Lian	87455e7f32	swaps to use newer sample packing for mistral (#1773 ) * swaps to use newer sample packing for mistral * fix multipack patch test * patch the common fa utils * update for refactor of flash attn unpad * remove un-needed drop attn mask for mistral * bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2 * update test	2024-07-23 01:41:11 -04:00
Keith Stevens	985819d89b	Add a `chat_template` prompt strategy for DPO (#1725 ) * Implementing a basic chat_template strategy for DPO datasets This mimics the sft chat_template strategy such that users can: * Specify the messages field * Specify the per message role and content fields * speicfy the chosen and rejected fields * Let the tokenizer construct the raw prompt * Ensure the chosen and rejected fields don't have any prefix tokens * Adding additional dpo chat template unittests * Rename test class	2024-07-21 09:10:42 -04:00
Wing Lian	fa91b698e9	Fix untrained tokens (#1771 ) * fix untrained reserved tokens * save model after fixing untrained embeddings * don't need fsdp conditional here	2024-07-19 12:21:37 -04:00
Wing Lian	e4063d60a7	bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769 ) * bump transformers and set roundup_power2_divisions for more VRAM improvements * support for low bit optimizers from torch ao * fix check for alternate optimizers and use nous models on hf for llama3 * add missing check for ao_adamw_fp8 * fix check when using custom optimizers w adamw	2024-07-19 00:47:07 -04:00
Wing Lian	7830fe04b5	Unsloth rope (#1767 ) * Add unsloth rope embeddings support * support for models weights in 4bit and do some memory gc * use accelerate logger * add unsloth llama rms norm optims * update docs for unsloth * more docs info	2024-07-18 14:54:41 -04:00
Wing Lian	c86c32a627	set the number of dataset processes on the DPO Config rather than the trainer (#1762 )	2024-07-17 15:38:37 -04:00
Wing Lian	8731b95d04	re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments (#1765 ) [skip ci]	2024-07-17 15:38:26 -04:00
Wing Lian	8619b2d855	add torch_compile_mode options (#1763 ) [skip ci] * add torch_compile_mode options * make sure n_gpu is an int	2024-07-17 15:38:07 -04:00
Wing Lian	976f85195a	fixes to accelerator so that iterable pretraining datasets work (#1759 ) * fixes to accelerator so that iterable pretraining datasets work * fix the pretraining test params * split batches, not dispatch batches needs to be set * update c4 datasets * set epochs in pretrain config test * need to set both split_batches and dispatch_batches to false for pretraining * fix bool val in comment	2024-07-17 10:58:38 -04:00
Wing Lian	152ab76623	fix num gpu check (#1760 )	2024-07-17 10:58:14 -04:00
Wing Lian	5f58555bd0	support for llama multipack using updated code/patches (#1754 ) * support for llama multipack using updated code/patches * also support unsloth patches * incorrect arg * add config validation for unsloth * add missing return to validation * add another missing return to validation	2024-07-16 17:36:29 -04:00
Wing Lian	cfc533a7f7	torch compile and cuda alloc improvements (#1755 ) * enable experimental expandable_segments * hf trainer seems to be missing torch compile * disable PYTORCH_CUDA_ALLOC_CONF to see if that fixes cicd	2024-07-16 16:00:23 -04:00
Wing Lian	78e12f8ca5	add basic support for the optimi adamw optimizer (#1727 ) * add support for optimi_adamw optimizer w kahan summation * pydantic validator for optimi_adamw * workaround for setting optimizer for fsdp * make sure to install optimizer packages * make sure to have parity for model parameters passed to optimizer * add smoke test for optimi_adamw optimizer * don't use foreach optimi by default	2024-07-14 19:12:57 -04:00
Wing Lian	98af5388ba	bump flash attention 2.5.8 -> 2.6.1 (#1738 ) * bump flash attention 2.5.8 -> 2.6.1 * use triton implementation of cross entropy from flash attn * add smoke test for flash attn cross entropy patch * fix args to xentropy.apply * handle tuple from triton loss fn * ensure the patch tests run independently * use the wrapper already built into flash attn for cross entropy * mark pytest as forked for patches * use pytest xdist instead of forked, since cuda doesn't like forking * limit to 1 process and use dist loadfile for pytest * change up pytest for fixture to reload transformers w monkeypathc	2024-07-14 19:11:31 -04:00
Wing Lian	a4a5bf057f	fixes to prevent vram spike when train starts (#1742 )	2024-07-13 09:53:13 -04:00
Wing Lian	47e1916484	add tests so CI can catch updates where patches will break with unsloth (#1737 ) [skip ci]	2024-07-11 16:43:19 -04:00

... 2 3 4 5 6 ...

944 Commits