Dan Saunders
70c4e6fbe6
updates and cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
2a7f139ad2
pre-commit fix
2025-01-10 16:28:51 +00:00
Dan Saunders
332ce0ae85
fixes and cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
e5fa842ff8
update
2025-01-10 16:28:51 +00:00
Dan Saunders
78e0ec0aa5
changes
2025-01-10 16:28:51 +00:00
Dan Saunders
3bc568eb27
adding registration function
2025-01-10 16:28:51 +00:00
Dan Saunders
eb6611d55f
progress on modeling code
2025-01-10 16:28:51 +00:00
Dan Saunders
4ff3328e66
updated custom modeling code
2025-01-10 16:28:51 +00:00
Dan Saunders
a3fd5074a9
fix duplicate-code warnings
2025-01-10 16:28:51 +00:00
Dan Saunders
5b90da0be3
added modeling code; cleanup + refactor
2025-01-10 16:28:51 +00:00
Dan Saunders
fcbfa86373
refactor and fixing test isolation issues
2025-01-10 16:28:51 +00:00
Dan Saunders
0d56582090
adding yaml dumper preserving input config format
2025-01-10 16:28:51 +00:00
Dan Saunders
1d935f65c3
moving tests around for flash_attn install
2025-01-10 16:28:51 +00:00
Dan Saunders
66176b3e07
adding split_heads argument for retaining original (Q, K) dimensionanlity
2025-01-10 16:28:51 +00:00
Dan Saunders
505321ac95
isolating problematic test
2025-01-10 16:28:51 +00:00
Dan Saunders
0b382c88da
fixes post-rebase
2025-01-10 16:28:51 +00:00
Dan Saunders
ea07a7086e
plugin implementation
2025-01-10 16:28:51 +00:00
Dan Saunders
d22e1136bc
convert-differential-transformer test coverage
2025-01-10 16:28:51 +00:00
Dan Saunders
63b8e42c6b
duplicate code ignore
2025-01-10 16:28:51 +00:00
Dan Saunders
bda1eed59e
differential flash attention 2; cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
41ebd93158
moving monkeypatch
2025-01-10 16:28:51 +00:00
Dan Saunders
4c050ce807
pre-commit fix
2025-01-10 16:28:51 +00:00
Dan Saunders
6665acf63d
fix model save / load logic
2025-01-10 16:28:51 +00:00
Dan Saunders
2f9fa4c465
various improvemnents
2025-01-10 16:28:51 +00:00
Dan Saunders
849bc94112
various improvemnents
2025-01-10 16:28:51 +00:00
Dan Saunders
e484ec778d
training fixes, patching, minor cleanup
2025-01-10 16:28:51 +00:00
Dan Saunders
df1504ae14
adding CLI command for convert-diff-transformer
2025-01-10 16:28:51 +00:00
Dan Saunders
7be0d7496c
Adding script for doing conversion; fixes and updates
2025-01-10 16:28:51 +00:00
Dan Saunders
13cdffa91f
initial diff attn layer / model conversion implementation (support for llama arch)
2025-01-10 16:28:51 +00:00
Dan Saunders
7a4b296f60
Basic evaluate CLI command / codepath ( #2188 )
...
* basic evaluate CLI command / codepath
* tests for evaluate CLI command
* fixes and cleanup
* review comments; slightly DRYing up things
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
2025-01-10 16:28:51 +00:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e
feat: add support for data_files in pretraining ( #2238 )
2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4
update upstream HF deps ( #2239 )
...
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170
Use SequentialSampler if curriculum_sampling is enabled with sample_packing ( #2235 )
2025-01-09 21:01:22 +00:00
NanoCode012
2e8d7c1adb
fix: mistral nemo does not recognize token_type_ids in forward ( #2233 )
2025-01-09 21:00:36 +00:00
salman
c1b920f291
Fixing OSX installation ( #2231 )
...
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
2025-01-07 13:42:01 +00:00
Wing Lian
3915abee4c
make sure padding is labeled as -100 for pretraining ( #2227 )
2024-12-31 15:22:18 -05:00
NJordan72
7a38dbe674
fix: allow trainer builder to use custom jinja chat template ( #2219 )
...
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
2024-12-24 16:18:50 -05:00
Wing Lian
e0a2eb2ebd
fix untrained tokens if specified explicitly from a list ( #2210 )
2024-12-23 09:08:28 -05:00
Wing Lian
d852d7af7a
inference - don't default w accelerate, fix base model ( #2216 ) [skip ci]
2024-12-23 07:48:41 -05:00
Wing Lian
2312caaa98
GC every n steps ( #2209 )
2024-12-21 17:38:33 -05:00
Wing Lian
307cf7c685
move the dataset loading from remote/disk to a shared function so we can re-use for RL ( #2204 )
2024-12-20 21:43:52 -05:00
Dan Saunders
70541145f1
adding test_datasets compat with pretraining_dataset (streaming) ( #2206 ) [skip ci]
2024-12-20 21:43:33 -05:00
Wing Lian
bd2a594b89
use DataCollatorWithFlattening when not sample packing ( #2167 )
2024-12-17 17:46:44 -05:00
Wing Lian
3798229d85
handle torch_compile set to auto ( #2172 ) [skip ci]
...
* handle torch_compile set to auto
* update docs [skip ci]
* add tests
2024-12-17 16:42:41 -05:00
NanoCode012
10cfecf02e
fix: use apply_chat_template to find turn boundaries and allow tool_calling field ( #2179 ) [skip ci]
...
* fix: use apply_chat_template to find turn boundaries and allow tool_calling field
* fix: keys to include in turn
* feat(doc): explicitly recommend setting train_on_eos and roles_to_train
* fix: eos not being masked for tool due to template padding
* chore: clear up docs
* fix: default messages format, train_on_eos: turn, and train on all assistant msg
* fix: properly warn if empty content
* feat: parametrize chat_template tests to test different tokenizers
* fix: set proper default for message key
* fix: update defaults to match load function
* fix: change defaults to use new
* feat: add tool_calling dataset
* feat: add tool_calling test
* fix: add handling of edge case of mistral tokenizer with only system prompt
* feat: refactor all test to follow source code
* fix: remove unnecessary eos_token from phi35
* fix test for phi3.5 since eos was dropped from chat_template
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-12-17 16:42:21 -05:00
Wing Lian
339f3c67e2
dataset tags don't support https uris ( #2195 )
2024-12-17 13:58:53 -05:00
Wing Lian
e246ceffa4
use axolotl contribs for fix_untrained_tokens ( #2194 ) [skip ci]
...
* use axolotl contribs for fix_untrained_tokens
* remove the module we're replacing
* Add check for using fix_untrained_tokens
2024-12-17 13:57:16 -05:00
Wing Lian
8ddc18ec8d
move the setting of PYTORCH_CUDA_ALLOC_CONF to the cli rather than train module ( #2183 ) [skip ci]
...
* move the setting of PYTORCH_CUDA_ALLOC_CONF to the cli rather than train module
* move set_pytorch_cuda_alloc_conf to a different module to have fewer loaded dependencies for the CLI
2024-12-17 13:56:48 -05:00
Wing Lian
1f623e6cc8
transformers 4.47.1 ( #2187 )
...
* transformers 4.47.1
* drop monkeypatches
* can't remove patches yet
* make flash attention forward ignore the loss kwargs
* patch the flash attention in the modeling arch too
* remove fsdp and deepspeed patches
* cleanup PR
* bump accelerate and torchao, also logically reorder/group requirements
* meant to include torchao
* use official patch release
2024-12-17 11:01:21 -05:00