Chirag Jain
0c8b1d824a
Update get_unpad_data patching for multipack ( #2013 )
...
* Update `get_unpad_data` patching for multipack
* Update src/axolotl/utils/models.py
* Update src/axolotl/utils/models.py
* Add test case
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-15 20:35:50 -05:00
Wing Lian
e1915f5625
Multimodal Vision Llama - rudimentary support ( #1940 )
...
---------
Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local >
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
2024-10-02 21:02:48 -04:00
Aman Gupta Karmani
159b8b9a74
monkey-patch transformers to simplify monkey-patching modeling code ( #1877 )
...
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten
* unnecessary now
* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
1e43660701
Sample pack trust remote code v2 ( #1873 )
...
* fix the multipack patch for remote code models
* add deepseek v2 lite example w fsdp
2024-08-27 13:39:24 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
35d5e59d78
set z3 leaf for deepseek v2 ( #1809 ) [skip ci]
...
* set z3 leaf for deepseek v2
* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
87455e7f32
swaps to use newer sample packing for mistral ( #1773 )
...
* swaps to use newer sample packing for mistral
* fix multipack patch test
* patch the common fa utils
* update for refactor of flash attn unpad
* remove un-needed drop attn mask for mistral
* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2
* update test
2024-07-23 01:41:11 -04:00
Wing Lian
5f58555bd0
support for llama multipack using updated code/patches ( #1754 )
...
* support for llama multipack using updated code/patches
* also support unsloth patches
* incorrect arg
* add config validation for unsloth
* add missing return to validation
* add another missing return to validation
2024-07-16 17:36:29 -04:00
Wing Lian
5370cedf0c
support for gemma2 w sample packing ( #1718 )
2024-06-29 01:38:55 -04:00
Wing Lian
4de4b4089f
add support for multipack for deepseek_v2 ( #1712 )
2024-06-20 10:02:55 -04:00
Wing Lian
6086be85f7
qwen2_moe support w multipack ( #1455 )
2024-03-29 11:04:53 -04:00
Wing Lian
05b398a072
fix some of the edge cases for Jamba ( #1452 )
...
* fix some of the edge cases for Jamba
* update requirements for jamba
2024-03-29 02:38:02 -04:00
Wing Lian
8df7b888ff
beta support for multipack with gemmoe: ( #1402 )
2024-03-14 15:52:23 -04:00
Eric Hartford
e0f1895408
add starcoder2 ( #1349 )
...
* add starcoder2
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* chore: lint
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-03-05 19:49:17 -05:00
Wing Lian
2752d5f958
multipack for gemma ( #1313 )
...
* multipack for gemma
* chore: lint
* handle cache_position kwarg in updated llama modeling
* add position_ids to rotary embed call for updated llama modeling
2024-02-21 19:24:21 -05:00
Wing Lian
5698943263
simplify haldning for newer multipack patches so they can be added in a single place ( #1270 )
2024-02-07 10:46:04 -05:00