Wing Lian
|
d28fee7609
|
use autoconfig w rala
|
2025-01-15 23:14:47 -05:00 |
|
Wing Lian
|
c196776996
|
option to not concatenate during pretraining
|
2025-01-15 22:45:02 -05:00 |
|
Wing Lian
|
79ae776102
|
fixup logging layer
|
2025-01-15 21:36:14 -05:00 |
|
Wing Lian
|
145664d82c
|
more fixups
|
2025-01-15 21:27:12 -05:00 |
|
Dan Saunders
|
28694219a5
|
inline comment change
|
2025-01-14 16:59:43 +00:00 |
|
Dan Saunders
|
fd8ad6fcbf
|
fixing negative component mixing
|
2025-01-13 19:21:55 +00:00 |
|
Dan Saunders
|
661d71a14b
|
adding diff attn negative component warmup (in progress)
|
2025-01-10 21:57:31 +00:00 |
|
Dan Saunders
|
6dd47edcb8
|
fire CLI fixes
|
2025-01-10 18:24:16 +00:00 |
|
Dan Saunders
|
7aca08ff60
|
adding guard statements
|
2025-01-10 16:39:21 +00:00 |
|
Dan Saunders
|
4f804f6d88
|
adding diff attn callback, adding documentation
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
443327c585
|
CLI build_command bugfix
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
70c4e6fbe6
|
updates and cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
2a7f139ad2
|
pre-commit fix
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
332ce0ae85
|
fixes and cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
e5fa842ff8
|
update
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
78e0ec0aa5
|
changes
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
3bc568eb27
|
adding registration function
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
eb6611d55f
|
progress on modeling code
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
4ff3328e66
|
updated custom modeling code
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
a3fd5074a9
|
fix duplicate-code warnings
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
5b90da0be3
|
added modeling code; cleanup + refactor
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
fcbfa86373
|
refactor and fixing test isolation issues
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
0d56582090
|
adding yaml dumper preserving input config format
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
390cb5742e
|
removing extra pytest xdist args
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
1d935f65c3
|
moving tests around for flash_attn install
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
66176b3e07
|
adding split_heads argument for retaining original (Q, K) dimensionanlity
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
505321ac95
|
isolating problematic test
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
0b382c88da
|
fixes post-rebase
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
ea07a7086e
|
plugin implementation
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
d22e1136bc
|
convert-differential-transformer test coverage
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
63b8e42c6b
|
duplicate code ignore
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
bda1eed59e
|
differential flash attention 2; cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
41ebd93158
|
moving monkeypatch
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
4c050ce807
|
pre-commit fix
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
6665acf63d
|
fix model save / load logic
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
2f9fa4c465
|
various improvemnents
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
849bc94112
|
various improvemnents
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
e484ec778d
|
training fixes, patching, minor cleanup
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
df1504ae14
|
adding CLI command for convert-diff-transformer
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
7be0d7496c
|
Adding script for doing conversion; fixes and updates
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
13cdffa91f
|
initial diff attn layer / model conversion implementation (support for llama arch)
|
2025-01-10 16:28:51 +00:00 |
|
Dan Saunders
|
7a4b296f60
|
Basic evaluate CLI command / codepath (#2188)
* basic evaluate CLI command / codepath
* tests for evaluate CLI command
* fixes and cleanup
* review comments; slightly DRYing up things
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
|
2025-01-10 16:28:51 +00:00 |
|
Wing Lian
|
d8b4027200
|
use 2.5.1 docker images as latest tag as it seems stable (#2198)
|
2025-01-10 08:35:25 -05:00 |
|
Wing Lian
|
fb3352e21c
|
rename liger test so it properly runs in ci (#2246)
|
2025-01-09 17:31:43 -05:00 |
|
NanoCode012
|
ed77e7001e
|
feat: add support for data_files in pretraining (#2238)
|
2025-01-09 21:04:13 +00:00 |
|
Wing Lian
|
7669a03fb4
|
update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
|
2025-01-09 21:01:59 +00:00 |
|
Vincenzo di Cicco
|
6553683170
|
Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235)
|
2025-01-09 21:01:22 +00:00 |
|
Wing Lian
|
5e0124e2ab
|
update modal version for ci (#2242)
|
2025-01-09 21:01:02 +00:00 |
|
NanoCode012
|
2e8d7c1adb
|
fix: mistral nemo does not recognize token_type_ids in forward (#2233)
|
2025-01-09 21:00:36 +00:00 |
|
Wing Lian
|
3c1921e400
|
add hf cache caching for GHA (#2247)
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
|
2025-01-09 20:59:54 +00:00 |
|