Wing Lian
6d4bbb877f
deprecate py 3.9 support, set min pytorch version ( #1343 ) [skip ci]
2024-02-28 12:58:05 -05:00
Wing Lian
0f985e12fe
more fixes 20240228 ( #1342 ) [skip ci]
...
* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests
2024-02-28 12:57:45 -05:00
Wing Lian
c1a7b3dd69
add gemma instruct chat template ( #1341 )
...
* add gemma instruct chat template
* support for chat tempalte strategy too
2024-02-27 17:20:01 -05:00
Ikko Eltociear Ashimine
2b9687f341
Update fastchat_conversation_turns.py ( #1294 ) [skip ci]
...
seperated -> separated
2024-02-27 09:06:10 -05:00
Wing Lian
2c9c88b32a
fix steps check for anneal on first cycle ( #1316 )
2024-02-27 08:56:08 -05:00
Hamel Husain
5265cd6b2c
Update debugging.md ( #1339 ) [skip ci]
2024-02-27 15:47:31 +09:00
NanoCode012
5be8b555a0
fix: checkpoint saving with deepspeed ( #1321 )
2024-02-27 15:46:44 +09:00
Maxime
0f6af36d50
Mps mistral lora ( #1292 ) [skip ci]
...
* Lora example for Mistral on MPS backend
* Add some MPS documentation
* Update examples/mistral/lora-mps.yml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/mistral/lora-mps.yml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update README.md
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 22:39:57 -05:00
Wing Lian
3f69571943
more pydantic fixes ( #1338 )
2024-02-26 22:39:13 -05:00
nopperl
1e3d5305d3
Support user-defined prompt processing strategies for dpo ( #1248 )
...
* support user-defined prompt processing strategies for dpo
* interpret dict dataset types as user-defined
* fix lint errors
* setup pydantic config for validation of User defined DPO
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 18:49:34 -05:00
Maxime
16482796b0
add lion-pytorch optimizer ( #1299 ) [skip ci]
...
* add lion-pytorch optimizer
* update pydantic to support lion optimizer
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 18:45:14 -05:00
Nathan Cooper
f30d062b48
Add StableLM 2 Example Scripts ( #1327 ) [skip ci]
...
* Add StableLM examples and configurations
* Add FFT and LORA configuration files and modify readme with usage
2024-02-26 18:44:25 -05:00
Wing Lian
269c5436ea
hotfix to exclude_unset from pydantic config when converting back to a dict ( #1334 )
2024-02-26 15:06:25 -05:00
Wing Lian
e7eed203d8
hotfix for missing outputs params ( #1333 )
2024-02-26 14:36:37 -05:00
Wing Lian
cf002312e0
hotfix for lora rank ( #1332 )
2024-02-26 14:28:43 -05:00
Wing Lian
7de912e097
hotfix for capabilities loading ( #1331 )
2024-02-26 14:24:28 -05:00
JohanWork
d75653407c
ADD: push checkpoints to mlflow artifact registry ( #1295 ) [skip ci]
...
* Add checkpoint logging to mlflow artifact registry
* clean up
* Update README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* update pydantic config from rebase
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 13:32:39 -05:00
NanoCode012
c6b01e0f4a
chore: update readme to be more clear ( #1326 ) [skip ci]
2024-02-26 13:32:13 -05:00
Wing Lian
cc3cebfa70
Pydantic 2.x cfg ( #1239 )
...
* WIP conversion to use pydantic for config validation
* wip, more fields, add capabilities
* wip
* update pydantic validation to match existing tests
* tweak requirements
* setup deprecated paams pydantic model
* more validations
* wrap up rest of the validations
* flesh out the rest of the options from the readme into pydantic
* fix model validators as class methods
remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning
* more missing attributes for cfg
* updates from PR feedback
* fix validation for datasets and pretrain datasets
* fix test for lora check
2024-02-26 12:24:14 -05:00
Wing Lian
5894f0e57e
make mlflow optional ( #1317 )
...
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
2024-02-26 11:41:33 -05:00
kallewoof
5cf226e177
Use yaml codeblock for config.yaml field ( #1303 ) [skip ci]
2024-02-24 21:59:16 +09:00
NanoCode012
2ed52bd568
fix(readme): Clarify doc for tokenizer_config ( #1323 ) [skip ci]
2024-02-24 21:55:04 +09:00
NanoCode012
a359579371
deprecate: pytorch 2.0.1 image ( #1315 ) [skip ci]
...
* deprecate: pytorch 2.0.1 image
* deprecate from main image
* Update main.yml
* Update tests.yml
2024-02-22 11:39:47 +09:00
Wing Lian
2752d5f958
multipack for gemma ( #1313 )
...
* multipack for gemma
* chore: lint
* handle cache_position kwarg in updated llama modeling
* add position_ids to rotary embed call for updated llama modeling
2024-02-21 19:24:21 -05:00
Monk
9e300aca0c
Adding Google's gemma Model ( #1312 )
2024-02-21 12:56:47 -05:00
NanoCode012
3d2cd804ae
fix(readme): update inference md link ( #1311 ) [skip ci]
2024-02-22 02:48:06 +09:00
Jared Palmer
6ab69ec5f8
Add instructions for playing with qlora model to colab example ( #1290 )
...
* Add instructions for playing with qlora model to colab example
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com >
2024-02-22 02:46:27 +09:00
David Meikle
3c00f406d6
Allow load_best_model_at_end to be configured for early stopping on custom evaluation datasets ( #1291 )
...
* Allow load_best_model_at_end when using test_datasets and val_set_size is zero for custom evaluation datasets
* Fixed formatting following failed Lint check
2024-02-22 00:57:18 +09:00
NanoCode012
a7a9a1433a
fix(examples): remove is_*_derived as it's parsed automatically ( #1297 )
2024-02-22 00:52:46 +09:00
Leonardo Emili
e2786cce6a
Validation always happens on first step ( #1300 )
2024-02-22 00:52:24 +09:00
Leonardo Emili
5a5d47458d
Add seq2seq eval benchmark callback ( #1274 )
...
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
2024-02-13 08:24:30 -08:00
김진원
8430db22e2
Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? ( #1273 )
2024-02-12 21:23:28 -08:00
Wing Lian
4b997c3e1a
allow the optimizer prune ratio for ReLoRA to be configurable ( #1287 )
...
* allow the optimizer prune ration for relora to be configurable
* update docs for relora
* prevent circular imports
2024-02-12 11:39:51 -08:00
Maxime
fac2d98c26
Add MPS support ( #1264 )
...
* add mps support
* linter stuff
* CI fixes
* install packaging for various tests
* Update setup.py
* Revert "install packaging for various tests"
This reverts commit 980e7aa44d .
* Revert "CI fixes"
This reverts commit 4609e3b166 .
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-12 08:30:32 -05:00
Wing Lian
ea00dd0852
don't use load and push together ( #1284 )
2024-02-09 14:54:31 -05:00
Hamel Husain
b2a4cb4396
Update README.md ( #1281 )
2024-02-09 07:38:08 -08:00
Wing Lian
aaf54dc730
run the docker image builds and push on gh action gpu runners ( #1218 )
2024-02-09 10:32:54 -05:00
Hamel Husain
9bca7db133
add support for https remote yamls ( #1277 )
2024-02-08 20:02:17 -08:00
Hamel Husain
91cf4ee72c
allow remote data paths ( #1278 )
...
* allow remote data paths
* add docs about public url
* only allow https
* better docs
* better docs
2024-02-08 15:02:35 -08:00
Wing Lian
1daecd161e
copy edits ( #1276 )
2024-02-08 09:00:04 -05:00
Wing Lian
4a654b331e
Add link to axolotl cloud image on latitude ( #1275 )
2024-02-08 08:50:11 -05:00
Wing Lian
5698943263
simplify haldning for newer multipack patches so they can be added in a single place ( #1270 )
2024-02-07 10:46:04 -05:00
Wing Lian
411293bdca
contributor avatars ( #1269 )
2024-02-07 07:09:01 -08:00
Zac Brannelly
73f1bdaa15
Fix bug preventing model_kwargs being injected ( #1262 )
2024-02-07 09:38:35 -05:00
JohanWork
1c7ed26785
lock pytorch ( #1247 ) [skip ci]
2024-02-06 07:48:26 -05:00
Philip May
13eea21f9b
Add more save strategies for DPO training. ( #1255 )
...
* Set save_strategy and save_steps in HFDPOTrainerBuilder
* fix doublicate save_steps
2024-02-06 00:38:43 -05:00
Chirag Jain
1072f28874
Fix typo bloat16 -> bfloat16 ( #1257 )
2024-02-06 00:38:14 -05:00
Wing Lian
c7cf3810bd
Pretrain transforms ( #1261 )
...
* wip for pretraining/iterable data with arbitrary prompt strategies
* more fixes, wip
* more fixes for custom pretraining
* iterable ds wrapper not needed
* remove extra features
* chore: lint
* update pretraning example yml
* fix order for partials
* fixup for tests
2024-02-06 00:37:03 -05:00
Wing Lian
8c2e05ade3
relora: magnitude pruning of the optimizer ( #1245 )
...
* magnitude pruning of the optimizer
* add alpaca chat template and fix relora patch
* fix handling of lora adapter for relora
* fix merge and save call
* fixes for 8-bit lora merge
* save intermediate checkpoint adapters
* auto merge
* fix eval check
* handle relora annealing
* fix anneal step logic
* chore: lint
* misx fix
* fix types
* Update tests/e2e/test_relora_llama.py
* check for safetensors saved from relora
2024-02-06 00:35:30 -05:00
NanoCode012
2d65f470d5
fix(model): apply gate fp32 only for mixtral ( #1241 )
...
* fix(model): apply gate fp32 only for mixtral
* Update src/axolotl/utils/models.py
* fix gate layer check
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-01 13:55:05 -05:00