Commit Graph

280 Commits

Author SHA1 Message Date
Maxime
0f6af36d50 Mps mistral lora (#1292) [skip ci]
* Lora example for Mistral on MPS backend

* Add some MPS documentation

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 22:39:57 -05:00
JohanWork
d75653407c ADD: push checkpoints to mlflow artifact registry (#1295) [skip ci]
* Add checkpoint logging to mlflow artifact registry

* clean up

* Update README.md

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* update pydantic config from rebase

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 13:32:39 -05:00
NanoCode012
c6b01e0f4a chore: update readme to be more clear (#1326) [skip ci] 2024-02-26 13:32:13 -05:00
Wing Lian
cc3cebfa70 Pydantic 2.x cfg (#1239)
* WIP conversion to use pydantic for config validation

* wip, more fields, add capabilities

* wip

* update pydantic validation to match existing tests

* tweak requirements

* setup deprecated paams pydantic model

* more validations

* wrap up rest of the validations

* flesh out the rest of the options from the readme into pydantic

* fix model validators as class methods

remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning

* more missing attributes for cfg

* updates from PR feedback

* fix validation for datasets and pretrain datasets

* fix test for lora check
2024-02-26 12:24:14 -05:00
NanoCode012
2ed52bd568 fix(readme): Clarify doc for tokenizer_config (#1323) [skip ci] 2024-02-24 21:55:04 +09:00
NanoCode012
3d2cd804ae fix(readme): update inference md link (#1311) [skip ci] 2024-02-22 02:48:06 +09:00
Leonardo Emili
5a5d47458d Add seq2seq eval benchmark callback (#1274)
* Add CausalLMBenchEvalCallback for measuring seq2seq performance

* Fix code for pre-commit

* Fix typing and improve logging

* eval_sample_packing must be false with CausalLMBenchEvalCallback
2024-02-13 08:24:30 -08:00
김진원
8430db22e2 Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (#1273) 2024-02-12 21:23:28 -08:00
Wing Lian
4b997c3e1a allow the optimizer prune ratio for ReLoRA to be configurable (#1287)
* allow the optimizer prune ration for relora to be configurable

* update docs for relora

* prevent circular imports
2024-02-12 11:39:51 -08:00
Hamel Husain
b2a4cb4396 Update README.md (#1281) 2024-02-09 07:38:08 -08:00
Hamel Husain
9bca7db133 add support for https remote yamls (#1277) 2024-02-08 20:02:17 -08:00
Hamel Husain
91cf4ee72c allow remote data paths (#1278)
* allow remote data paths

* add docs about public url

* only allow https

* better docs

* better docs
2024-02-08 15:02:35 -08:00
Wing Lian
1daecd161e copy edits (#1276) 2024-02-08 09:00:04 -05:00
Wing Lian
4a654b331e Add link to axolotl cloud image on latitude (#1275) 2024-02-08 08:50:11 -05:00
Wing Lian
411293bdca contributor avatars (#1269) 2024-02-07 07:09:01 -08:00
Wing Lian
dfd188502a add contact info for dedicated support for axolotl [skip ci] (#1243) 2024-02-01 12:59:07 -05:00
Wing Lian
00568c1539 support for true batches with multipack (#1230)
* support for true batches with multipack

* patch the map dataset fetcher to handle batches with packed indexes

* patch 4d mask creation for sdp attention

* better handling for BetterTransformer

* patch general case for 4d mask

* setup forward patch. WIP

* fix patch file

* support for multipack w/o flash attention for llama

* cleanup

* add warning about bf16 vs fp16 for multipack with sdpa

* bugfixes

* add 4d multipack tests, refactor patches

* update tests and add warnings

* fix e2e file check

* skip sdpa test if not at least torch 2.1.1, update docs
2024-02-01 10:18:42 -05:00
DreamGenX
5787e1a23f Fix and document test_datasets (#1228)
* Make sure test_dataset are used and treat val_set_size.

* Add test_datasets docs.

* Apply suggestions from code review

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-31 06:48:57 -05:00
Wing Lian
4cb7900a56 Peft lotfq (#1222)
* loftq support for lora

* fix loftq check

* update readme for loftq

* readability cleanup

* use peft main for loftq fixes, remove unnecessary special tokens

* remove unused test from older deprecation
2024-01-28 18:50:08 -05:00
mhenrichsen
98b4762077 Feat/chatml add system message (#1117)
* add system message to template

* readme update

* added code to register new system message

* register chatml template for test

---------

Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-25 08:24:27 +01:00
Wing Lian
54d2ac155b Mixtral fixes 20240124 (#1192) [skip ci]
* mixtral nccl fixes

* make sure to patch for z3
2024-01-24 14:59:57 -05:00
Wing Lian
b715cd549a update docs [skip ci] (#1176) 2024-01-23 11:14:52 -05:00
Tilemachos Chatzipapas
cc250391a0 Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155)
* Mistral-7b finetune example using axolotl with code,config,data

* Corrected the path for huggingface dataset

* Update data.jsonl

* chore: lint

---------

Co-authored-by: twenty8th <twenty8th@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-23 07:32:21 -05:00
Ayush Singh
9135b9e2aa Update README.md (#1169) [skip ci]
Fix typo
2024-01-23 07:25:44 -05:00
Wing Lian
782b6a4216 set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci]
* set fp16 to false if bf16, update bf16: auto in example YAMLs

* unset fp16 so that it fallsback properly if bf16 isn't available

* Update README.md [skip-ci]

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* test that bf16 disables fp16

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-01-22 18:44:01 -05:00
Wing Lian
2ce5c0d68a Deprecate max packed sequence len (#1141) 2024-01-20 05:11:50 -05:00
NanoCode012
3db5f2fd17 feat(dataset): add config to keep processed dataset in memory (#1152) 2024-01-20 13:19:28 +09:00
Joe Cummings
08b8ba09a5 Fix link for Minotaur model (#1146) [skip-ci] 2024-01-18 17:22:04 -05:00
Joe Cummings
1d70f24b50 Add shifted sparse attention (#973) [skip-ci]
* Add s2_attn to hijack flash code

* Refactor code to account for s2_attn

* Add test for models utils

* Add ``s2_attention`` option to llama configs

* Add ``s2_attention`` option to README config

* Format code to appease linter

* chore: lint

* Remove xpos and llama-landmark [bad merge]

* add e2e smoke tests for shifted sparse attention

* remove stray patch from merge

* update yml with link to paper for s2_attention/longlora

* fix assertion check for full fine tune

* increase sequence len for tests and PR feedback updates

* reduce context len to 16k for tests

* reduce context len to 16k for tests

* reduce batch size for larger context len and udpate test to check message

* fix test for message

---------

Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-18 10:16:07 -05:00
Wing Lian
ece0211996 Agnostic cloud gpu docker image and Jupyter lab (#1097) 2024-01-15 22:37:54 -05:00
xzuyn
8487b97cf3 Add layers_to_transform for lora_config (#1118) 2024-01-15 21:29:55 -05:00
NanoCode012
9cd27b2f91 fix(readme): clarify custom user prompt [no-ci] (#1124)
* fix(readme): clarify custom user prompt

* chore: update example to show use case of setting field
2024-01-16 09:47:33 +09:00
Hamel Husain
2dc431078c Add link on README to Docker Debugging (#1107)
* add docker debug

* Update docs/debugging.md

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* explain editable install

* explain editable install

* upload new video

* add link to README

* Update README.md

* Update README.md

* chore: lint

* make sure to lint markdown too

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-12 08:51:35 -05:00
Hamel Husain
b502392e82 Update README.md (#1103)
* Update README.md

* Update README.md
2024-01-11 16:41:58 -08:00
Hamel Husain
7512c3ad20 Add Debugging Guide (#1089)
* add debug guide

* add background

* add .gitignore

* Update devtools/dev_sharegpt.yml

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Update docs/debugging.md

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* simplify example axolotl config

* add additional comments

* add video and TOC

* try jsonc for better md rendering

* style video thumbnail better

* fix footnote

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-10 20:49:24 -08:00
Wing Lian
d7057ccd36 paired kto support (#1069) 2024-01-09 13:30:45 -05:00
Johan Hansson
090c24dcb0 Add: mlflow for experiment tracking (#1059) [skip ci]
* Update requirements.txt

adding mlflow

* Update __init__.py

Imports for mlflow

* Update README.md

* Create mlflow_.py (#1)

* Update README.md

* fix precommits

* Update README.md

Update mlflow_tracking_uri

* Update trainer_builder.py

update trainer building

* chore: lint

* make ternary a bit more readable

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-09 09:34:09 -05:00
Ricardo Dominguez-Olmedo
04b978b428 Cosine learning rate schedule - minimum learning rate (#1062)
* Cosine min lr

* Cosine min lr - warn if using deepspeed

* cosine_min_lr_ratio readme

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-09 09:29:56 -05:00
Wing Lian
14964417ee Sponsors (#1065)
* wip sponsors section in readme

* add ko-fi and contributors list
2024-01-08 18:52:00 -05:00
kallewoof
bdfefaf054 feature: better device mapping for large models (#918)
* fix: improved memory handling when model is bigger than existing VRAM

* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)

For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.

* doc: add README

* fix: enable progress bars in do_merge_lora()

* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README

* Update src/axolotl/utils/models.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* fix: remove deletion of removed model_kwargs key

* fix: validate that gpu_memory_limit and max_memory are not both set

---------

Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-05 22:22:21 +09:00
Hamel Husain
63fb3eb426 set default for merge (#1044) 2024-01-04 18:14:20 -08:00
Hamel Husain
a3e8783328 [Docs] delete unused cfg value lora_out_dir (#1029)
* Update README.md

* Update README.md

* Update README.md

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-01-02 21:35:20 -08:00
NanoCode012
b31038aae9 chore(readme): update instruction to set config to load from cache (#1030) 2024-01-03 11:56:19 +09:00
Wing Lian
4d2e842e46 use recommended setting for use_reentrant w gradient checkpointing (#1021)
* use recommended setting for use_reentrant w gradient checkpointing

* add doc for gradient_checkpointing_kwargs
2024-01-01 22:17:27 -05:00
mhenrichsen
f8ae59b0a8 Adds chat templates (#1022) 2023-12-29 15:44:23 -06:00
NanoCode012
41353d2ea0 feat: expose bnb kwargs (#1018)
* feat: expose bnb kwargs

* chore: added examples and link per suggestion

* Uncomment defaults per suggestion for readability

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>

---------

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4 feat: remove need to add load_in* during merge (#1017) 2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53 [Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013) 2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da Update README.md (#1012) 2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4 remove landmark attn and xpos rope implementations (#1010) 2023-12-27 21:07:27 -08:00