axolotl

Author	SHA1	Message	Date
Wing Lian	b5b3452b2b	improve iterable support	2025-01-09 18:57:27 -05:00
Wing Lian	6bbe3ac641	support streaming for processing sft datasts?	2025-01-09 18:57:27 -05:00
Wing Lian	9ed455ef8c	make loss torch script compat	2025-01-09 18:57:26 -05:00
Wing Lian	66823c113c	kd sample packing	2025-01-09 18:57:26 -05:00
Wing Lian	e976de4d8f	be a bit pickier about loading dynamic prompt strategies	2025-01-09 18:57:26 -05:00
Wing Lian	8eb82bba40	more info on preprocess for kd and fix import	2025-01-09 18:57:26 -05:00
Wing Lian	9fe36db215	remove duplicate code	2025-01-09 18:57:26 -05:00
Wing Lian	9dcc879e04	add copyrights	2025-01-09 18:57:26 -05:00
Wing Lian	1e577a29a8	increase logging around loading plugins	2025-01-09 18:57:26 -05:00
Wing Lian	4037fdb43a	make plugin setup concise	2025-01-09 18:57:26 -05:00
Wing Lian	385c60cd9b	remove moved class from import	2025-01-09 18:57:26 -05:00
Wing Lian	06370b386a	move more things to kd plugin	2025-01-09 18:57:26 -05:00
Wing Lian	3da6a652fa	refactor kd chat template loader	2025-01-09 18:57:25 -05:00
Wing Lian	84547c724d	support for custom trainer classes from plugins	2025-01-09 18:57:25 -05:00
Wing Lian	51547c656a	handle token/logprob shifting	2025-01-09 18:57:25 -05:00
Wing Lian	7c4ae15942	remove references to triton kd for now	2025-01-09 18:57:25 -05:00
Wing Lian	cdb167e7f7	add license block	2025-01-09 18:57:25 -05:00
Wing Lian	52f1d7aee2	refactor so we can easily add new loss functions	2025-01-09 18:57:25 -05:00
Wing Lian	319c3531e7	chore: lint	2025-01-09 18:57:25 -05:00
Wing Lian	87eb6a3324	var naming and add todo	2025-01-09 18:57:25 -05:00
Wing Lian	f03fa703b7	fix kd loss so it's causal (fixes repeating tokens)	2025-01-09 18:57:25 -05:00
Wing Lian	53ec07d44c	use kd_alpha in the correct loss method	2025-01-09 18:57:25 -05:00
Wing Lian	8d77dc385e	hash for temperature too	2025-01-09 18:57:24 -05:00
Wing Lian	8b0104fa7c	better rescaling for temperatures	2025-01-09 18:57:24 -05:00
Wing Lian	546ad007ec	don't use triton for now	2025-01-09 18:57:24 -05:00
Wing Lian	868a49cb96	fix kwarg	2025-01-09 18:57:24 -05:00
Wing Lian	4a12b1b22e	v3	2025-01-09 18:57:24 -05:00
Wing Lian	973ed841cd	no torch.tensor	2025-01-09 18:57:24 -05:00
Wing Lian	9c0470130b	no log etc	2025-01-09 18:57:24 -05:00
Wing Lian	0da2b7c7cc	no torch.exp inside triton kernel	2025-01-09 18:57:24 -05:00
Wing Lian	7c813a1d27	v2 trial	2025-01-09 18:57:24 -05:00
Wing Lian	0a08bb4f78	no where support	2025-01-09 18:57:24 -05:00
Wing Lian	8075a92a33	triton wip	2025-01-09 18:57:23 -05:00
Wing Lian	ba6eacd167	chore: lint	2025-01-09 18:57:23 -05:00
Wing Lian	e2fae47114	make sure to multiply against the correct loss	2025-01-09 18:57:23 -05:00
Wing Lian	7d281b71dc	cross entropy loss coefficient during KD	2025-01-09 18:57:23 -05:00
Wing Lian	b080c53afc	flipped the slice	2025-01-09 18:57:23 -05:00
Wing Lian	1ea225129f	make it work	2025-01-09 18:57:23 -05:00
Wing Lian	e2aba41939	handle padding/collation for KD datasets	2025-01-09 18:57:23 -05:00
Wing Lian	21caaaa2e9	make batch smaller	2025-01-09 18:57:23 -05:00
Wing Lian	08d9f582e4	filter bad rows	2025-01-09 18:57:23 -05:00
Wing Lian	39daeb2c79	KD dataset loading and KD with logprobs	2025-01-09 18:57:22 -05:00
Wing Lian	02c9898a95	refactor trainer to prevent circular dependencies later fix loader default	2025-01-09 18:57:19 -05:00
Wing Lian	fb3352e21c	rename liger test so it properly runs in ci (#2246 )	2025-01-09 17:31:43 -05:00
NanoCode012	ed77e7001e	feat: add support for data_files in pretraining (#2238 )	2025-01-09 21:04:13 +00:00
Wing Lian	7669a03fb4	update upstream HF deps (#2239 ) * bump axolotl contribs for upstream main conflicts: * bump datasets, tokenizer, trl * remove log workarounds in trl * bump lm-eval * remove unsloth_ import from critical path * remove llama fa2 from conftest * unsloth breaks with latest upstream	2025-01-09 21:01:59 +00:00
Vincenzo di Cicco	6553683170	Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235 )	2025-01-09 21:01:22 +00:00
Wing Lian	5e0124e2ab	update modal version for ci (#2242 )	2025-01-09 21:01:02 +00:00
NanoCode012	2e8d7c1adb	fix: mistral nemo does not recognize token_type_ids in forward (#2233 )	2025-01-09 21:00:36 +00:00
Wing Lian	3c1921e400	add hf cache caching for GHA (#2247 ) * add hf cache caching for GHA * use modal volume to cache hf data * make sure to update the cache as we add new fixtures in conftest	2025-01-09 20:59:54 +00:00

1 2 3 4 5 ...

1848 Commits