Sunny Liu
|
cee310dcfa
|
llama sdpa patching WIP
|
2025-01-22 20:15:23 -05:00 |
|
Sunny Liu
|
d1be6e228d
|
llama sdpa patching WIP
|
2025-01-22 20:14:20 -05:00 |
|
Sunny Liu
|
5f9f77f384
|
llama patch
|
2025-01-22 11:29:28 -05:00 |
|
bursteratom
|
b2a34380b3
|
sample packing doc mask creation WIP
|
2025-01-21 09:18:38 -05:00 |
|
Sunny Liu
|
80bfc50d1f
|
get seqlens from position ids for foc masking
|
2025-01-17 17:22:04 -05:00 |
|
Sunny Liu
|
a5360c172c
|
llama hijacking
|
2025-01-17 15:54:03 -05:00 |
|
Sunny Liu
|
013a9b73fc
|
fix transformers version for testing
|
2025-01-16 15:32:57 -05:00 |
|
Sunny
|
aad62428e0
|
not sure if this is necessary actually
|
2025-01-16 15:08:34 -05:00 |
|
Sunny
|
a6f2c5d583
|
flex sample packing WIP
|
2025-01-15 21:12:33 -05:00 |
|
Sunny
|
dbcd11e533
|
revert seq len in multipack sampler
|
2025-01-14 11:45:35 -05:00 |
|
Sunny
|
c06a6be915
|
flex_attn sample packing WIP
|
2025-01-14 00:22:05 -05:00 |
|
bursteratom
|
d3a0cb5edb
|
transformers version
|
2025-01-13 10:33:00 -05:00 |
|
bursteratom
|
8b47e456b0
|
revert to transformers 4.47.1
|
2025-01-13 10:29:27 -05:00 |
|
Sunny Liu
|
2319ac729c
|
Merge branch 'main' into flx_attn_support
|
2025-01-13 09:42:58 -05:00 |
|
Sunny
|
f99cae0e7b
|
llama test
|
2025-01-12 17:30:19 -05:00 |
|
Wing Lian
|
888cd9407f
|
use 2.5.1 docker images as latest tag as it seems stable (#2198)
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
bd62d6e10a
|
rename liger test so it properly runs in ci (#2246)
|
2025-01-12 13:34:17 -05:00 |
|
NanoCode012
|
5eae134110
|
feat: add support for data_files in pretraining (#2238)
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
b7d27bdfa4
|
update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
|
2025-01-12 13:34:17 -05:00 |
|
Vincenzo di Cicco
|
da97a21bdc
|
Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235)
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
e0d4b88598
|
update modal version for ci (#2242)
|
2025-01-12 13:34:17 -05:00 |
|
NanoCode012
|
fac059a209
|
fix: mistral nemo does not recognize token_type_ids in forward (#2233)
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
9c9ac1cf0b
|
add hf cache caching for GHA (#2247)
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
2346f21b2b
|
Merge group queue (#2248)
* add support for merge groups
* also lint merge groups
|
2025-01-12 13:34:17 -05:00 |
|
salman
|
0b47281f51
|
Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
|
2025-01-12 13:34:17 -05:00 |
|
Wing Lian
|
d8b4027200
|
use 2.5.1 docker images as latest tag as it seems stable (#2198)
|
2025-01-10 08:35:25 -05:00 |
|
Wing Lian
|
fb3352e21c
|
rename liger test so it properly runs in ci (#2246)
|
2025-01-09 17:31:43 -05:00 |
|
Sunny
|
543daaf46f
|
llama test
|
2025-01-09 16:08:24 -05:00 |
|
NanoCode012
|
ed77e7001e
|
feat: add support for data_files in pretraining (#2238)
|
2025-01-09 21:04:13 +00:00 |
|
Wing Lian
|
7669a03fb4
|
update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
|
2025-01-09 21:01:59 +00:00 |
|
Vincenzo di Cicco
|
6553683170
|
Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235)
|
2025-01-09 21:01:22 +00:00 |
|
Wing Lian
|
5e0124e2ab
|
update modal version for ci (#2242)
|
2025-01-09 21:01:02 +00:00 |
|
NanoCode012
|
2e8d7c1adb
|
fix: mistral nemo does not recognize token_type_ids in forward (#2233)
|
2025-01-09 21:00:36 +00:00 |
|
Wing Lian
|
3c1921e400
|
add hf cache caching for GHA (#2247)
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
|
2025-01-09 20:59:54 +00:00 |
|
Wing Lian
|
7faf2b6e8e
|
Merge group queue (#2248)
* add support for merge groups
* also lint merge groups
|
2025-01-09 15:49:00 -05:00 |
|
salman
|
c1b920f291
|
Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
|
2025-01-07 13:42:01 +00:00 |
|
Sunny
|
bcd9ad44e0
|
flex attention support
|
2025-01-06 19:54:11 -05:00 |
|
bursteratom
|
61ad375bf4
|
config validation for flex attention
|
2025-01-05 23:27:49 -05:00 |
|
Wing Lian
|
3915abee4c
|
make sure padding is labeled as -100 for pretraining (#2227)
|
2024-12-31 15:22:18 -05:00 |
|
NJordan72
|
7a38dbe674
|
fix: allow trainer builder to use custom jinja chat template (#2219)
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
|
2024-12-24 16:18:50 -05:00 |
|
Wing Lian
|
e0a2eb2ebd
|
fix untrained tokens if specified explicitly from a list (#2210)
|
2024-12-23 09:08:28 -05:00 |
|
Wing Lian
|
d852d7af7a
|
inference - don't default w accelerate, fix base model (#2216) [skip ci]
|
2024-12-23 07:48:41 -05:00 |
|
Wing Lian
|
3742deb1de
|
add deepspeed example with torch compile enabled (#2212) [skip ci]
|
2024-12-22 12:11:39 -05:00 |
|
Wing Lian
|
2312caaa98
|
GC every n steps (#2209)
|
2024-12-21 17:38:33 -05:00 |
|
Wing Lian
|
307cf7c685
|
move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204)
|
2024-12-20 21:43:52 -05:00 |
|
Dan Saunders
|
70541145f1
|
adding test_datasets compat with pretraining_dataset (streaming) (#2206) [skip ci]
|
2024-12-20 21:43:33 -05:00 |
|
Wing Lian
|
42bd32a233
|
add outputs (symlink) to gitignore [skip ci] (#2205)
|
2024-12-19 20:14:43 -05:00 |
|
Dan Saunders
|
5b8fb5e939
|
remove cicd pytest xdist args (#2201)
* remove cicd pytest xdist args
* Delete outputs
|
2024-12-19 11:44:53 -05:00 |
|
Wing Lian
|
bd2a594b89
|
use DataCollatorWithFlattening when not sample packing (#2167)
|
2024-12-17 17:46:44 -05:00 |
|
Wing Lian
|
3798229d85
|
handle torch_compile set to auto (#2172) [skip ci]
* handle torch_compile set to auto
* update docs [skip ci]
* add tests
|
2024-12-17 16:42:41 -05:00 |
|