Dan Saunders
|
e5fa842ff8
|
update
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
78e0ec0aa5
|
changes
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
3bc568eb27
|
adding registration function
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
eb6611d55f
|
progress on modeling code
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
4ff3328e66
|
updated custom modeling code
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
a3fd5074a9
|
fix duplicate-code warnings
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
5b90da0be3
|
added modeling code; cleanup + refactor
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
fcbfa86373
|
refactor and fixing test isolation issues
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
0d56582090
|
adding yaml dumper preserving input config format
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
390cb5742e
|
removing extra pytest xdist args
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
1d935f65c3
|
moving tests around for flash_attn install
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
66176b3e07
|
adding split_heads argument for retaining original (Q, K) dimensionanlity
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
505321ac95
|
isolating problematic test
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
0b382c88da
|
fixes post-rebase
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
ea07a7086e
|
plugin implementation
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
d22e1136bc
|
convert-differential-transformer test coverage
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
63b8e42c6b
|
duplicate code ignore
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
bda1eed59e
|
differential flash attention 2; cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
41ebd93158
|
moving monkeypatch
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
4c050ce807
|
pre-commit fix
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
6665acf63d
|
fix model save / load logic
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
2f9fa4c465
|
various improvemnents
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
849bc94112
|
various improvemnents
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
e484ec778d
|
training fixes, patching, minor cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
df1504ae14
|
adding CLI command for convert-diff-transformer
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
7be0d7496c
|
Adding script for doing conversion; fixes and updates
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
13cdffa91f
|
initial diff attn layer / model conversion implementation (support for llama arch)
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
7a4b296f60
|
Basic evaluate CLI command / codepath (#2188)
* basic evaluate CLI command / codepath
* tests for evaluate CLI command
* fixes and cleanup
* review comments; slightly DRYing up things
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
|
2025-01-10 16:28:51 +00:00 |
|
Wing Lian
|
d8b4027200
|
use 2.5.1 docker images as latest tag as it seems stable (#2198)
|
2025-01-10 08:35:25 -05:00 |
|
Wing Lian
|
fb3352e21c
|
rename liger test so it properly runs in ci (#2246)
|
2025-01-09 17:31:43 -05:00 |
|
NanoCode012
|
ed77e7001e
|
feat: add support for data_files in pretraining (#2238)
|
2025-01-09 21:04:13 +00:00 |
|
Wing Lian
|
7669a03fb4
|
update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
|
2025-01-09 21:01:59 +00:00 |
|
Vincenzo di Cicco
|
6553683170
|
Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235)
|
2025-01-09 21:01:22 +00:00 |
|
Wing Lian
|
5e0124e2ab
|
update modal version for ci (#2242)
|
2025-01-09 21:01:02 +00:00 |
|
NanoCode012
|
2e8d7c1adb
|
fix: mistral nemo does not recognize token_type_ids in forward (#2233)
|
2025-01-09 21:00:36 +00:00 |
|
Wing Lian
|
3c1921e400
|
add hf cache caching for GHA (#2247)
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
|
2025-01-09 20:59:54 +00:00 |
|
Wing Lian
|
7faf2b6e8e
|
Merge group queue (#2248)
* add support for merge groups
* also lint merge groups
|
2025-01-09 15:49:00 -05:00 |
|
salman
|
c1b920f291
|
Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
|
2025-01-07 13:42:01 +00:00 |
|
Wing Lian
|
3915abee4c
|
make sure padding is labeled as -100 for pretraining (#2227)
|
2024-12-31 15:22:18 -05:00 |
|
NJordan72
|
7a38dbe674
|
fix: allow trainer builder to use custom jinja chat template (#2219)
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
|
2024-12-24 16:18:50 -05:00 |
|
Wing Lian
|
e0a2eb2ebd
|
fix untrained tokens if specified explicitly from a list (#2210)
|
2024-12-23 09:08:28 -05:00 |
|
Wing Lian
|
d852d7af7a
|
inference - don't default w accelerate, fix base model (#2216) [skip ci]
|
2024-12-23 07:48:41 -05:00 |
|
Wing Lian
|
3742deb1de
|
add deepspeed example with torch compile enabled (#2212) [skip ci]
|
2024-12-22 12:11:39 -05:00 |
|
Wing Lian
|
2312caaa98
|
GC every n steps (#2209)
|
2024-12-21 17:38:33 -05:00 |
|
Wing Lian
|
307cf7c685
|
move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204)
|
2024-12-20 21:43:52 -05:00 |
|
Dan Saunders
|
70541145f1
|
adding test_datasets compat with pretraining_dataset (streaming) (#2206) [skip ci]
|
2024-12-20 21:43:33 -05:00 |
|
Wing Lian
|
42bd32a233
|
add outputs (symlink) to gitignore [skip ci] (#2205)
|
2024-12-19 20:14:43 -05:00 |
|
Dan Saunders
|
5b8fb5e939
|
remove cicd pytest xdist args (#2201)
* remove cicd pytest xdist args
* Delete outputs
|
2024-12-19 11:44:53 -05:00 |
|
Wing Lian
|
bd2a594b89
|
use DataCollatorWithFlattening when not sample packing (#2167)
|
2024-12-17 17:46:44 -05:00 |
|
Wing Lian
|
3798229d85
|
handle torch_compile set to auto (#2172) [skip ci]
* handle torch_compile set to auto
* update docs [skip ci]
* add tests
|
2024-12-17 16:42:41 -05:00 |
|