axolotl

Author	SHA1	Message	Date
Wing Lian	f60c623af0	remove references to triton kd for now	2025-01-14 22:47:43 -05:00
Wing Lian	746891eb5c	add license block	2025-01-14 22:47:43 -05:00
Wing Lian	f09b5da60b	refactor so we can easily add new loss functions	2025-01-14 22:47:43 -05:00
Wing Lian	689e1c10ba	chore: lint	2025-01-14 22:47:43 -05:00
Wing Lian	a5c085e003	var naming and add todo	2025-01-14 22:47:43 -05:00
Wing Lian	63146300b7	fix kd loss so it's causal (fixes repeating tokens)	2025-01-14 22:47:43 -05:00
Wing Lian	ca5e397fc5	use kd_alpha in the correct loss method	2025-01-14 22:47:42 -05:00
Wing Lian	3416302b0d	hash for temperature too	2025-01-14 22:47:42 -05:00
Wing Lian	7366efc4ca	better rescaling for temperatures	2025-01-14 22:47:42 -05:00
Wing Lian	d8d817eaed	don't use triton for now	2025-01-14 22:47:42 -05:00
Wing Lian	c0757e8a20	fix kwarg	2025-01-14 22:47:42 -05:00
Wing Lian	e565694914	v3	2025-01-14 22:47:42 -05:00
Wing Lian	081928e55b	no torch.tensor	2025-01-14 22:47:42 -05:00
Wing Lian	dc90c93894	no log etc	2025-01-14 22:47:41 -05:00
Wing Lian	18a46c338a	no torch.exp inside triton kernel	2025-01-14 22:47:41 -05:00
Wing Lian	119d586cf4	v2 trial	2025-01-14 22:47:41 -05:00
Wing Lian	c73acd7de0	no where support	2025-01-14 22:47:41 -05:00
Wing Lian	0b59a242d4	triton wip	2025-01-14 22:47:41 -05:00
Wing Lian	ed490517da	chore: lint	2025-01-14 22:47:41 -05:00
Wing Lian	00ce77e7ef	make sure to multiply against the correct loss	2025-01-14 22:47:41 -05:00
Wing Lian	ae545e0165	cross entropy loss coefficient during KD	2025-01-14 22:47:40 -05:00
Wing Lian	b592c05b93	flipped the slice	2025-01-14 22:47:40 -05:00
Wing Lian	7fe0ad088b	make it work	2025-01-14 22:47:40 -05:00
Wing Lian	ddcf5c68b3	handle padding/collation for KD datasets	2025-01-14 22:47:40 -05:00
Wing Lian	e633a12dbe	make batch smaller	2025-01-14 22:47:40 -05:00
Wing Lian	d584354ee4	filter bad rows	2025-01-14 22:47:40 -05:00
Wing Lian	303cfa71aa	KD dataset loading and KD with logprobs	2025-01-14 22:47:40 -05:00
Wing Lian	88b3198894	refactor trainer to prevent circular dependencies later fix loader default	2025-01-14 22:47:39 -05:00
jwongTensora	8606093921	fix for indexing error from token/embeddings mismatch (#2257 ) Co-authored-by: jwong <jwongTensora@gmail.com>	2025-01-14 22:09:29 -05:00
NanoCode012	cba5a457d9	fix: use text_column even when not packing for pretraining (#2254 ) * fix: use text_column even when not packing for pretraining * feat: update test to check when not packing * chore: lint * Update src/axolotl/utils/data/pretraining.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-01-14 22:08:56 -05:00
Wing Lian	19cd83d408	rename references to dpo dataset prep to pref data (#2258 )	2025-01-14 22:07:55 -05:00
Dan Saunders	1ed4de73b6	CLI cleanup and documentation (#2244 ) * CLI init refactor * fix * cleanup and (partial) docs * Adding documentation and continuing cleanup (in progress) * remove finetune.py script * continued cleanup and documentation * pytest fixes * review comments * fix * Fix * typing fixes * make sure the batch dataset patcher for multipack is always loaded when handling datasets * review comments * fix --------- Co-authored-by: Dan Saunders <dan@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-01-13 17:55:29 +00:00
Wing Lian	f89e962119	skip over rows in pretraining dataset (#2223 ) * skip over rows in pretraining dataset * update docs	2025-01-13 10:44:45 -05:00
Wing Lian	bc1c9c20e3	assume empty lora dropout means 0.0 and add tests (#2243 ) * assume empty lora dropout means 0.0 and add tests * remove un-necessary arg * refactor based on pr feedback: * chore: lint	2025-01-13 10:44:11 -05:00
Wing Lian	dd26cc3c0f	add helper to verify the correct model output file exists (#2245 ) * add helper to verify the correct model output file exists * more checks using helper * chore: lint * fix import and relora model check * workaround for trl trainer saves * remove stray print	2025-01-13 10:43:29 -05:00
Wing Lian	d8b4027200	use 2.5.1 docker images as latest tag as it seems stable (#2198 )	2025-01-10 08:35:25 -05:00
Wing Lian	fb3352e21c	rename liger test so it properly runs in ci (#2246 )	2025-01-09 17:31:43 -05:00
NanoCode012	ed77e7001e	feat: add support for data_files in pretraining (#2238 )	2025-01-09 21:04:13 +00:00
Wing Lian	7669a03fb4	update upstream HF deps (#2239 ) * bump axolotl contribs for upstream main conflicts: * bump datasets, tokenizer, trl * remove log workarounds in trl * bump lm-eval * remove unsloth_ import from critical path * remove llama fa2 from conftest * unsloth breaks with latest upstream	2025-01-09 21:01:59 +00:00
Vincenzo di Cicco	6553683170	Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235 )	2025-01-09 21:01:22 +00:00
Wing Lian	5e0124e2ab	update modal version for ci (#2242 )	2025-01-09 21:01:02 +00:00
NanoCode012	2e8d7c1adb	fix: mistral nemo does not recognize token_type_ids in forward (#2233 )	2025-01-09 21:00:36 +00:00
Wing Lian	3c1921e400	add hf cache caching for GHA (#2247 ) * add hf cache caching for GHA * use modal volume to cache hf data * make sure to update the cache as we add new fixtures in conftest	2025-01-09 20:59:54 +00:00
Wing Lian	7faf2b6e8e	Merge group queue (#2248 ) * add support for merge groups * also lint merge groups	2025-01-09 15:49:00 -05:00
salman	c1b920f291	Fixing OSX installation (#2231 ) * bumping version, removing non-osx compatible deps * updating pylintrc * fixing linters * reverting changes	2025-01-07 13:42:01 +00:00
Wing Lian	3915abee4c	make sure padding is labeled as -100 for pretraining (#2227 )	2024-12-31 15:22:18 -05:00
NJordan72	7a38dbe674	fix: allow trainer builder to use custom jinja chat template (#2219 ) * fix: allow trainer builder to use custom jinja chat template * chore: use get_chat_template_from_config Co-authored-by: Chirag Jain <jain.chirag925@gmail.com> * fix: swap imports --------- Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>	2024-12-24 16:18:50 -05:00
Wing Lian	e0a2eb2ebd	fix untrained tokens if specified explicitly from a list (#2210 )	2024-12-23 09:08:28 -05:00
Wing Lian	d852d7af7a	inference - don't default w accelerate, fix base model (#2216 ) [skip ci]	2024-12-23 07:48:41 -05:00
Wing Lian	3742deb1de	add deepspeed example with torch compile enabled (#2212 ) [skip ci]	2024-12-22 12:11:39 -05:00

1 2 3 4 5 ...

1841 Commits