Wing Lian
ed922796b7
include multipack support for qwen3 family ( #2622 )
2025-05-03 12:02:39 -04:00
NanoCode012
a6d28d19b1
feat: add glm and glm4 multipack and cce ( #2546 )
...
* feat: add glm and glm4 multipack
* feat: add glm4 example
* feat: add cce for glm
2025-04-23 10:27:51 -04:00
Wing Lian
8bbad21bfd
llama4 support ( #2493 )
...
* llama4 support
* add xet support [skip ci]
* be flexible on transformers version and skip test on version
* don't use deepspeed for the fix_untrained_tokens test
* reordering to trigger torch 2.6.0 tests first
* slightly smaller train set
* use 4.51.0 for now
* remove stray print, add llama4 chat template to schema, bump peft to 0.15.1
* patches to make llama4 performant
* add preliminary fp8 support
2025-04-07 10:49:15 -04:00
Wing Lian
328d598114
gemma3 packing fixes ( #2449 )
...
* make gemma3 work with packing
* multi-gpu e2e for ci
* update gemma3 model namespace to use mirror
* add gradient checkpointing to multigpu e2e ci
* update gemma3 examples for use_reentrant and fix ddp find unused params
* fix tests for gemma3
* fix import for test utils
* set correct train loss for gemma3 e2e
2025-03-31 17:15:23 -04:00
NanoCode012
2c34a4634e
feat: add CCE for gemma3, cohere, and cohere2 ( #2443 )
...
* feat: add CCE for gemma3 and cohere1/2
* fix: change from relative import to absolute
* feat: add multipack for cohere&cohere2
* chore: improve comments
* fix: add gemma3_text
* feat: add cohere2 example
* fix: cohere forward
* fix: patch for cohere2
* feat: add command r v01 qlora sample
* chore: lint
* feat: upgrade gemma3 and gemma2 patch to use logits_to_keep
* chore: lint
* fix: add deprecate_kwarg decorator
* fix: add cce for gemma3 conditionalgeneration
* fix: gemma3 patch to defer logits calculation
* fix: patch gemma3 if given as model
* fix: remove not working config
* fix: update comments to clarify changes
* feat(doc): add supported models to readme
* fix: address difference in our cohere patch
* feat: add mistral3
* feat: add gemma
* feat(doc): update README to include gemma and mistral3 in supported models
* fix: gemma patch
* fix: import
* fix: gemma patch to be standalone
* fix: gemma3 warn about not support final_logit_softcapping
* feat: add mllama CCE
* chore: add abbireviation to doc
* fix: remove unneeded gemma3 eager warning
* fix: save processor if available
* fix: enable save processor on merge
* fix: wrong env meaning
2025-03-26 18:13:51 -04:00
NanoCode012
9f00465a5c
Feat: Add support for gemma3_text and add e2e for gemma2 ( #2406 )
2025-03-22 20:33:21 -04:00
NanoCode012
1110a37e21
feat: add deepseek_v3 sample packing ( #2230 )
2025-02-24 15:03:15 -05:00
Chirag Jain
0c8b1d824a
Update get_unpad_data patching for multipack ( #2013 )
...
* Update `get_unpad_data` patching for multipack
* Update src/axolotl/utils/models.py
* Update src/axolotl/utils/models.py
* Add test case
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-15 20:35:50 -05:00
Wing Lian
e1915f5625
Multimodal Vision Llama - rudimentary support ( #1940 )
...
---------
Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local >
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
2024-10-02 21:02:48 -04:00
Aman Gupta Karmani
159b8b9a74
monkey-patch transformers to simplify monkey-patching modeling code ( #1877 )
...
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten
* unnecessary now
* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
1e43660701
Sample pack trust remote code v2 ( #1873 )
...
* fix the multipack patch for remote code models
* add deepseek v2 lite example w fsdp
2024-08-27 13:39:24 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
35d5e59d78
set z3 leaf for deepseek v2 ( #1809 ) [skip ci]
...
* set z3 leaf for deepseek v2
* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
87455e7f32
swaps to use newer sample packing for mistral ( #1773 )
...
* swaps to use newer sample packing for mistral
* fix multipack patch test
* patch the common fa utils
* update for refactor of flash attn unpad
* remove un-needed drop attn mask for mistral
* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2
* update test
2024-07-23 01:41:11 -04:00
Wing Lian
5f58555bd0
support for llama multipack using updated code/patches ( #1754 )
...
* support for llama multipack using updated code/patches
* also support unsloth patches
* incorrect arg
* add config validation for unsloth
* add missing return to validation
* add another missing return to validation
2024-07-16 17:36:29 -04:00
Wing Lian
5370cedf0c
support for gemma2 w sample packing ( #1718 )
2024-06-29 01:38:55 -04:00
Wing Lian
4de4b4089f
add support for multipack for deepseek_v2 ( #1712 )
2024-06-20 10:02:55 -04:00
Wing Lian
6086be85f7
qwen2_moe support w multipack ( #1455 )
2024-03-29 11:04:53 -04:00
Wing Lian
05b398a072
fix some of the edge cases for Jamba ( #1452 )
...
* fix some of the edge cases for Jamba
* update requirements for jamba
2024-03-29 02:38:02 -04:00
Wing Lian
8df7b888ff
beta support for multipack with gemmoe: ( #1402 )
2024-03-14 15:52:23 -04:00
Eric Hartford
e0f1895408
add starcoder2 ( #1349 )
...
* add starcoder2
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* chore: lint
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-03-05 19:49:17 -05:00
Wing Lian
2752d5f958
multipack for gemma ( #1313 )
...
* multipack for gemma
* chore: lint
* handle cache_position kwarg in updated llama modeling
* add position_ids to rotary embed call for updated llama modeling
2024-02-21 19:24:21 -05:00
Wing Lian
5698943263
simplify haldning for newer multipack patches so they can be added in a single place ( #1270 )
2024-02-07 10:46:04 -05:00