axolotl

Author	SHA1	Message	Date
Wing Lian	ed922796b7	include multipack support for qwen3 family (#2622 )	2025-05-03 12:02:39 -04:00
NanoCode012	a6d28d19b1	feat: add glm and glm4 multipack and cce (#2546 ) * feat: add glm and glm4 multipack * feat: add glm4 example * feat: add cce for glm	2025-04-23 10:27:51 -04:00
Wing Lian	8bbad21bfd	llama4 support (#2493 ) * llama4 support * add xet support [skip ci] * be flexible on transformers version and skip test on version * don't use deepspeed for the fix_untrained_tokens test * reordering to trigger torch 2.6.0 tests first * slightly smaller train set * use 4.51.0 for now * remove stray print, add llama4 chat template to schema, bump peft to 0.15.1 * patches to make llama4 performant * add preliminary fp8 support	2025-04-07 10:49:15 -04:00
Wing Lian	328d598114	gemma3 packing fixes (#2449 ) * make gemma3 work with packing * multi-gpu e2e for ci * update gemma3 model namespace to use mirror * add gradient checkpointing to multigpu e2e ci * update gemma3 examples for use_reentrant and fix ddp find unused params * fix tests for gemma3 * fix import for test utils * set correct train loss for gemma3 e2e	2025-03-31 17:15:23 -04:00
NanoCode012	2c34a4634e	feat: add CCE for gemma3, cohere, and cohere2 (#2443 ) * feat: add CCE for gemma3 and cohere1/2 * fix: change from relative import to absolute * feat: add multipack for cohere&cohere2 * chore: improve comments * fix: add gemma3_text * feat: add cohere2 example * fix: cohere forward * fix: patch for cohere2 * feat: add command r v01 qlora sample * chore: lint * feat: upgrade gemma3 and gemma2 patch to use logits_to_keep * chore: lint * fix: add deprecate_kwarg decorator * fix: add cce for gemma3 conditionalgeneration * fix: gemma3 patch to defer logits calculation * fix: patch gemma3 if given as model * fix: remove not working config * fix: update comments to clarify changes * feat(doc): add supported models to readme * fix: address difference in our cohere patch * feat: add mistral3 * feat: add gemma * feat(doc): update README to include gemma and mistral3 in supported models * fix: gemma patch * fix: import * fix: gemma patch to be standalone * fix: gemma3 warn about not support final_logit_softcapping * feat: add mllama CCE * chore: add abbireviation to doc * fix: remove unneeded gemma3 eager warning * fix: save processor if available * fix: enable save processor on merge * fix: wrong env meaning	2025-03-26 18:13:51 -04:00
NanoCode012	9f00465a5c	Feat: Add support for gemma3_text and add e2e for gemma2 (#2406 )	2025-03-22 20:33:21 -04:00
NanoCode012	1110a37e21	feat: add deepseek_v3 sample packing (#2230 )	2025-02-24 15:03:15 -05:00
Chirag Jain	0c8b1d824a	Update `get_unpad_data` patching for multipack (#2013 ) * Update `get_unpad_data` patching for multipack * Update src/axolotl/utils/models.py * Update src/axolotl/utils/models.py * Add test case --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2024-11-15 20:35:50 -05:00
Wing Lian	e1915f5625	Multimodal Vision Llama - rudimentary support (#1940 ) --------- Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local> Co-authored-by: sunny <sunnyliu19981005@gmail.com>	2024-10-02 21:02:48 -04:00
Aman Gupta Karmani	159b8b9a74	monkey-patch transformers to simplify monkey-patching modeling code (#1877 ) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment	2024-08-27 17:22:26 -07:00
Wing Lian	1e43660701	Sample pack trust remote code v2 (#1873 ) * fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp	2024-08-27 13:39:24 -04:00
Wing Lian	fefa95e350	most model types now support flash attention 2 regardless of multipack support (#1854 )	2024-08-22 16:39:23 -04:00
Wing Lian	35d5e59d78	set z3 leaf for deepseek v2 (#1809 ) [skip ci] * set z3 leaf for deepseek v2 * add deepseek v2 chat template	2024-08-06 09:30:46 -04:00
Wing Lian	87455e7f32	swaps to use newer sample packing for mistral (#1773 ) * swaps to use newer sample packing for mistral * fix multipack patch test * patch the common fa utils * update for refactor of flash attn unpad * remove un-needed drop attn mask for mistral * bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2 * update test	2024-07-23 01:41:11 -04:00
Wing Lian	5f58555bd0	support for llama multipack using updated code/patches (#1754 ) * support for llama multipack using updated code/patches * also support unsloth patches * incorrect arg * add config validation for unsloth * add missing return to validation * add another missing return to validation	2024-07-16 17:36:29 -04:00
Wing Lian	5370cedf0c	support for gemma2 w sample packing (#1718 )	2024-06-29 01:38:55 -04:00
Wing Lian	4de4b4089f	add support for multipack for deepseek_v2 (#1712 )	2024-06-20 10:02:55 -04:00
Wing Lian	6086be85f7	qwen2_moe support w multipack (#1455 )	2024-03-29 11:04:53 -04:00
Wing Lian	05b398a072	fix some of the edge cases for Jamba (#1452 ) * fix some of the edge cases for Jamba * update requirements for jamba	2024-03-29 02:38:02 -04:00
Wing Lian	8df7b888ff	beta support for multipack with gemmoe: (#1402 )	2024-03-14 15:52:23 -04:00
Eric Hartford	e0f1895408	add starcoder2 (#1349 ) * add starcoder2 * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * chore: lint * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-03-05 19:49:17 -05:00
Wing Lian	2752d5f958	multipack for gemma (#1313 ) * multipack for gemma * chore: lint * handle cache_position kwarg in updated llama modeling * add position_ids to rotary embed call for updated llama modeling	2024-02-21 19:24:21 -05:00
Wing Lian	5698943263	simplify haldning for newer multipack patches so they can be added in a single place (#1270 )	2024-02-07 10:46:04 -05:00

23 Commits