Dan Saunders
2daa94080c
Merge branch 'main' into diff-transformer
2025-01-27 14:46:17 +00:00
Dan Saunders
0e9bfa6dee
small fixes, improvements
2025-01-24 19:53:54 +00:00
Dan Saunders
ef38f10274
merging into main
2025-01-24 18:03:27 +00:00
Wing Lian
887513285d
support for custom lr groups for non-embedding modules ( #2213 )
...
* support for custom lr groups for non-embedding modules
invert name check for group modules
include lr_groups in training args
additional conditional for creating optimizer
fix regular params as w weight decay
fix lookup and add docs
* address pr feedback
2025-01-24 12:56:28 -05:00
Wing Lian
20620771f1
Pretrain multipack ( #2278 )
...
* fix for pretrain with packing
* fix model name and loss expected
* make sure to check with micro batch size for pretraining
* change loss threshholds based on parametrization
* make tests smaller for CI
* fix pretrain packing
* fix pretrain packing test
* address pr feedback
2025-01-24 12:55:20 -05:00
Dan Saunders
66262c3092
moving out all diff attn code to plugin repo
2025-01-24 17:46:11 +00:00
NanoCode012
6086162488
chore(doc): improve explanation for *_steps and *_strategy ( #2270 )
2025-01-24 10:07:02 -05:00
mashdragon
b2774af66c
Take split param from config in all load_dataset instances ( #2281 )
2025-01-24 10:06:50 -05:00
NanoCode012
74f9782fc3
chore(doc): fix explanation on gcs creds retrieval ( #2272 )
2025-01-24 10:05:58 -05:00
Wing Lian
8a7a0b07dc
support for latest transformers release 4.48.1 ( #2256 )
2025-01-23 21:17:57 -05:00
Dan Saunders
016ba124e4
README update
2025-01-23 22:11:35 +00:00
Dan Saunders
7145d52d99
moving diff attn code to separate repo
2025-01-23 21:33:53 +00:00
Wing Lian
8fb72cbc0b
use the extracted field_messages to parse the role fields ( #2265 )
2025-01-21 15:39:30 -05:00
Adithya Kamath
bb9d4102c4
Add 5000 line history limit to tmux for docker cloud ( #2268 )
2025-01-21 15:39:17 -05:00
Wing Lian
af727eedf7
option to not concatenate during pretraining ( #2263 )
...
* option to not concatenate during pretraining
* simplify conditional and add doc to config.qmd
2025-01-20 14:07:34 -05:00
jwongTensora
8606093921
fix for indexing error from token/embeddings mismatch ( #2257 )
...
Co-authored-by: jwong <jwongTensora@gmail.com >
2025-01-14 22:09:29 -05:00
NanoCode012
cba5a457d9
fix: use text_column even when not packing for pretraining ( #2254 )
...
* fix: use text_column even when not packing for pretraining
* feat: update test to check when not packing
* chore: lint
* Update src/axolotl/utils/data/pretraining.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-01-14 22:08:56 -05:00
Wing Lian
19cd83d408
rename references to dpo dataset prep to pref data ( #2258 )
2025-01-14 22:07:55 -05:00
Dan Saunders
28694219a5
inline comment change
2025-01-14 16:59:43 +00:00
Dan Saunders
fd8ad6fcbf
fixing negative component mixing
2025-01-13 19:21:55 +00:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3
assume empty lora dropout means 0.0 and add tests ( #2243 )
...
* assume empty lora dropout means 0.0 and add tests
* remove un-necessary arg
* refactor based on pr feedback:
* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Dan Saunders
661d71a14b
adding diff attn negative component warmup (in progress)
2025-01-10 21:57:31 +00:00
Dan Saunders
6dd47edcb8
fire CLI fixes
2025-01-10 18:24:16 +00:00
Dan Saunders
7aca08ff60
adding guard statements
2025-01-10 16:39:21 +00:00
Dan Saunders
4f804f6d88
adding diff attn callback, adding documentation
2025-01-10 16:28:51 +00:00
Dan Saunders
443327c585
CLI build_command bugfix
2025-01-10 16:28:51 +00:00
Dan Saunders
70c4e6fbe6
updates and cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
2a7f139ad2
pre-commit fix
2025-01-10 16:28:51 +00:00
Dan Saunders
332ce0ae85
fixes and cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
e5fa842ff8
update
2025-01-10 16:28:51 +00:00
Dan Saunders
78e0ec0aa5
changes
2025-01-10 16:28:51 +00:00
Dan Saunders
3bc568eb27
adding registration function
2025-01-10 16:28:51 +00:00
Dan Saunders
eb6611d55f
progress on modeling code
2025-01-10 16:28:51 +00:00
Dan Saunders
4ff3328e66
updated custom modeling code
2025-01-10 16:28:51 +00:00
Dan Saunders
a3fd5074a9
fix duplicate-code warnings
2025-01-10 16:28:51 +00:00
Dan Saunders
5b90da0be3
added modeling code; cleanup + refactor
2025-01-10 16:28:51 +00:00
Dan Saunders
fcbfa86373
refactor and fixing test isolation issues
2025-01-10 16:28:51 +00:00
Dan Saunders
0d56582090
adding yaml dumper preserving input config format
2025-01-10 16:28:51 +00:00
Dan Saunders
390cb5742e
removing extra pytest xdist args
2025-01-10 16:28:51 +00:00
Dan Saunders
1d935f65c3
moving tests around for flash_attn install
2025-01-10 16:28:51 +00:00
Dan Saunders
66176b3e07
adding split_heads argument for retaining original (Q, K) dimensionanlity
2025-01-10 16:28:51 +00:00
Dan Saunders
505321ac95
isolating problematic test
2025-01-10 16:28:51 +00:00
Dan Saunders
0b382c88da
fixes post-rebase
2025-01-10 16:28:51 +00:00
Dan Saunders
ea07a7086e
plugin implementation
2025-01-10 16:28:51 +00:00
Dan Saunders
d22e1136bc
convert-differential-transformer test coverage
2025-01-10 16:28:51 +00:00
Dan Saunders
63b8e42c6b
duplicate code ignore
2025-01-10 16:28:51 +00:00
Dan Saunders
bda1eed59e
differential flash attention 2; cleanup
2025-01-10 16:28:51 +00:00