axolotl

Author	SHA1	Message	Date
Wing Lian	06370b386a	move more things to kd plugin	2025-01-09 18:57:26 -05:00
Wing Lian	3da6a652fa	refactor kd chat template loader	2025-01-09 18:57:25 -05:00
Wing Lian	84547c724d	support for custom trainer classes from plugins	2025-01-09 18:57:25 -05:00
Wing Lian	51547c656a	handle token/logprob shifting	2025-01-09 18:57:25 -05:00
Wing Lian	7c4ae15942	remove references to triton kd for now	2025-01-09 18:57:25 -05:00
Wing Lian	cdb167e7f7	add license block	2025-01-09 18:57:25 -05:00
Wing Lian	52f1d7aee2	refactor so we can easily add new loss functions	2025-01-09 18:57:25 -05:00
Wing Lian	319c3531e7	chore: lint	2025-01-09 18:57:25 -05:00
Wing Lian	87eb6a3324	var naming and add todo	2025-01-09 18:57:25 -05:00
Wing Lian	f03fa703b7	fix kd loss so it's causal (fixes repeating tokens)	2025-01-09 18:57:25 -05:00
Wing Lian	53ec07d44c	use kd_alpha in the correct loss method	2025-01-09 18:57:25 -05:00
Wing Lian	8d77dc385e	hash for temperature too	2025-01-09 18:57:24 -05:00
Wing Lian	8b0104fa7c	better rescaling for temperatures	2025-01-09 18:57:24 -05:00
Wing Lian	546ad007ec	don't use triton for now	2025-01-09 18:57:24 -05:00
Wing Lian	868a49cb96	fix kwarg	2025-01-09 18:57:24 -05:00
Wing Lian	4a12b1b22e	v3	2025-01-09 18:57:24 -05:00
Wing Lian	973ed841cd	no torch.tensor	2025-01-09 18:57:24 -05:00
Wing Lian	9c0470130b	no log etc	2025-01-09 18:57:24 -05:00
Wing Lian	0da2b7c7cc	no torch.exp inside triton kernel	2025-01-09 18:57:24 -05:00
Wing Lian	7c813a1d27	v2 trial	2025-01-09 18:57:24 -05:00
Wing Lian	0a08bb4f78	no where support	2025-01-09 18:57:24 -05:00
Wing Lian	8075a92a33	triton wip	2025-01-09 18:57:23 -05:00
Wing Lian	ba6eacd167	chore: lint	2025-01-09 18:57:23 -05:00
Wing Lian	e2fae47114	make sure to multiply against the correct loss	2025-01-09 18:57:23 -05:00
Wing Lian	7d281b71dc	cross entropy loss coefficient during KD	2025-01-09 18:57:23 -05:00
Wing Lian	b080c53afc	flipped the slice	2025-01-09 18:57:23 -05:00
Wing Lian	1ea225129f	make it work	2025-01-09 18:57:23 -05:00
Wing Lian	e2aba41939	handle padding/collation for KD datasets	2025-01-09 18:57:23 -05:00
Wing Lian	21caaaa2e9	make batch smaller	2025-01-09 18:57:23 -05:00
Wing Lian	08d9f582e4	filter bad rows	2025-01-09 18:57:23 -05:00
Wing Lian	39daeb2c79	KD dataset loading and KD with logprobs	2025-01-09 18:57:22 -05:00
Wing Lian	02c9898a95	refactor trainer to prevent circular dependencies later fix loader default	2025-01-09 18:57:19 -05:00
Wing Lian	fb3352e21c	rename liger test so it properly runs in ci (#2246 )	2025-01-09 17:31:43 -05:00
NanoCode012	ed77e7001e	feat: add support for data_files in pretraining (#2238 )	2025-01-09 21:04:13 +00:00
Wing Lian	7669a03fb4	update upstream HF deps (#2239 ) * bump axolotl contribs for upstream main conflicts: * bump datasets, tokenizer, trl * remove log workarounds in trl * bump lm-eval * remove unsloth_ import from critical path * remove llama fa2 from conftest * unsloth breaks with latest upstream	2025-01-09 21:01:59 +00:00
Vincenzo di Cicco	6553683170	Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235 )	2025-01-09 21:01:22 +00:00
Wing Lian	5e0124e2ab	update modal version for ci (#2242 )	2025-01-09 21:01:02 +00:00
NanoCode012	2e8d7c1adb	fix: mistral nemo does not recognize token_type_ids in forward (#2233 )	2025-01-09 21:00:36 +00:00
Wing Lian	3c1921e400	add hf cache caching for GHA (#2247 ) * add hf cache caching for GHA * use modal volume to cache hf data * make sure to update the cache as we add new fixtures in conftest	2025-01-09 20:59:54 +00:00
Wing Lian	7faf2b6e8e	Merge group queue (#2248 ) * add support for merge groups * also lint merge groups	2025-01-09 15:49:00 -05:00
salman	c1b920f291	Fixing OSX installation (#2231 ) * bumping version, removing non-osx compatible deps * updating pylintrc * fixing linters * reverting changes	2025-01-07 13:42:01 +00:00
Wing Lian	3915abee4c	make sure padding is labeled as -100 for pretraining (#2227 )	2024-12-31 15:22:18 -05:00
NJordan72	7a38dbe674	fix: allow trainer builder to use custom jinja chat template (#2219 ) * fix: allow trainer builder to use custom jinja chat template * chore: use get_chat_template_from_config Co-authored-by: Chirag Jain <jain.chirag925@gmail.com> * fix: swap imports --------- Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>	2024-12-24 16:18:50 -05:00
Wing Lian	e0a2eb2ebd	fix untrained tokens if specified explicitly from a list (#2210 )	2024-12-23 09:08:28 -05:00
Wing Lian	d852d7af7a	inference - don't default w accelerate, fix base model (#2216 ) [skip ci]	2024-12-23 07:48:41 -05:00
Wing Lian	3742deb1de	add deepspeed example with torch compile enabled (#2212 ) [skip ci]	2024-12-22 12:11:39 -05:00
Wing Lian	2312caaa98	GC every n steps (#2209 )	2024-12-21 17:38:33 -05:00
Wing Lian	307cf7c685	move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204 )	2024-12-20 21:43:52 -05:00
Dan Saunders	70541145f1	adding test_datasets compat with pretraining_dataset (streaming) (#2206 ) [skip ci]	2024-12-20 21:43:33 -05:00
Wing Lian	42bd32a233	add outputs (symlink) to gitignore [skip ci] (#2205 )	2024-12-19 20:14:43 -05:00

1 2 3 4 5 ...

1837 Commits