Commit Graph

298 Commits

Author SHA1 Message Date
NanoCode012
c2b64e4dcf Feat: update doc (#1475) [skip ci]
* feat: update doc contents

* chore: move batch vs ga docs

* feat: update lambdalabs instructions

* fix: refactor dev instructions
2024-04-04 13:43:40 +09:00
James Melvin Ebenezer
cae608f587 Added pip install ninja to accelerate installation of flash-attn (#1461)
* Added pip install ninja to accelerate installation of flash-attn

* doc: cleanup
2024-04-02 17:36:41 +09:00
Hamel Husain
86b7d22f35 Reorganize Docs (#1468) 2024-04-01 08:00:52 -07:00
Phuc Van Phan
324d59ea0d docs: update link to docs of advance topic in README.md (#1437) 2024-03-24 21:49:27 -07:00
NanoCode012
f1ebaa07c6 chore(config): refactor old mistral config (#1435)
* chore(config): refactor old mistral config

* chore: add link to colab on readme
2024-03-25 12:00:44 +09:00
Hamel Husain
629450cecd Bootstrap Hosted Axolotl Docs w/Quarto (#1429)
* precommit

* mv styes.css

* fix links
2024-03-21 22:28:36 -07:00
Wing Lian
dd449c5cd8 support galore once upstreamed into transformers (#1409)
* support galore once upstreamed into transformers

* update module name for llama in readme and fix typing for all linear

* bump trl for deprecation fixes from newer transformers

* include galore as an extra and install in docker image

* fix optim_args type

* fix optim_args

* update dependencies for galore

* add galore to cicd dockerfile
2024-03-19 09:26:35 -04:00
NanoCode012
40a88e8c4a Feat: Add sharegpt multirole (#1137)
* feat(prompt): support multiple roles for sharegpt

* fix: add handling of empty role back

* feat: rebased and allowed more dynamic roles via config

* fix: variable

* chore: update message

* feat: add vicuna format

* fix: JSON serializable error

* fix: typing

* fix: don't remap for unknown keys

* fix: add roles to pydantic

* feat: add test

* chore: remove leftover print

* chore: remove leftover comment

* chore: remove print

* fix: update test to use chatml
2024-03-19 20:51:49 +09:00
Seungduk Kim
43bdc5d3de Add a config not to shuffle merged dataset (#1394) [skip ci]
* Add a config not to shuffle merged dataset

* Update README.md

* Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* invert the condition name

* update README

* info -> debug

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-03-19 20:51:00 +09:00
NanoCode012
b1e3e1b25f fix(config): passing gradient_checkpoint_kwargs (#1412)
* fix(config): change default use_reentrant to true

* Update trainer_builder.py

* fix: make sure to pass kwargs to enable checkpoint

* chore: lint
2024-03-19 12:57:43 +09:00
jbl
e8c8ea64b3 Update README.md (#1418)
Add Phorm AI Badge
2024-03-17 23:47:46 -04:00
NanoCode012
f083aed2c7 Fix(readme): Improve README QuickStart info (#1408)
* Fix(readme): Improve README QuickStart info

* chore: add to toc
2024-03-16 21:10:22 +09:00
NanoCode012
868c33954d Feat(readme): Add instructions for Google GPU VM instances (#1410) 2024-03-16 21:10:05 +09:00
Hamel Husain
8b12468230 Add QLoRA + FSDP Docs (#1403)
* pre commit

* Update fsdp_qlora.md
2024-03-14 11:04:51 -04:00
Wing Lian
638c2dafb5 JarvisLabs (#1372)
* add Jarvis cloud gpu and sponsorship

* whitespace
2024-03-07 10:47:32 -05:00
Hamel Husain
ed70a08348 add docs for input_output format (#1367) [skip ci]
* add docs

* add docs

* run linter
2024-03-06 09:09:49 -05:00
Nicolas Rojas
37657473c8 Remove unsupported python version 3.9 from README (#1364) [skip ci] 2024-03-05 21:19:36 -05:00
Wing Lian
6b3b271925 fix for protected model_ namespace w pydantic (#1345) 2024-02-28 15:07:49 -05:00
Maxime
0f6af36d50 Mps mistral lora (#1292) [skip ci]
* Lora example for Mistral on MPS backend

* Add some MPS documentation

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 22:39:57 -05:00
JohanWork
d75653407c ADD: push checkpoints to mlflow artifact registry (#1295) [skip ci]
* Add checkpoint logging to mlflow artifact registry

* clean up

* Update README.md

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* update pydantic config from rebase

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 13:32:39 -05:00
NanoCode012
c6b01e0f4a chore: update readme to be more clear (#1326) [skip ci] 2024-02-26 13:32:13 -05:00
Wing Lian
cc3cebfa70 Pydantic 2.x cfg (#1239)
* WIP conversion to use pydantic for config validation

* wip, more fields, add capabilities

* wip

* update pydantic validation to match existing tests

* tweak requirements

* setup deprecated paams pydantic model

* more validations

* wrap up rest of the validations

* flesh out the rest of the options from the readme into pydantic

* fix model validators as class methods

remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning

* more missing attributes for cfg

* updates from PR feedback

* fix validation for datasets and pretrain datasets

* fix test for lora check
2024-02-26 12:24:14 -05:00
NanoCode012
2ed52bd568 fix(readme): Clarify doc for tokenizer_config (#1323) [skip ci] 2024-02-24 21:55:04 +09:00
NanoCode012
3d2cd804ae fix(readme): update inference md link (#1311) [skip ci] 2024-02-22 02:48:06 +09:00
Leonardo Emili
5a5d47458d Add seq2seq eval benchmark callback (#1274)
* Add CausalLMBenchEvalCallback for measuring seq2seq performance

* Fix code for pre-commit

* Fix typing and improve logging

* eval_sample_packing must be false with CausalLMBenchEvalCallback
2024-02-13 08:24:30 -08:00
김진원
8430db22e2 Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (#1273) 2024-02-12 21:23:28 -08:00
Wing Lian
4b997c3e1a allow the optimizer prune ratio for ReLoRA to be configurable (#1287)
* allow the optimizer prune ration for relora to be configurable

* update docs for relora

* prevent circular imports
2024-02-12 11:39:51 -08:00
Hamel Husain
b2a4cb4396 Update README.md (#1281) 2024-02-09 07:38:08 -08:00
Hamel Husain
9bca7db133 add support for https remote yamls (#1277) 2024-02-08 20:02:17 -08:00
Hamel Husain
91cf4ee72c allow remote data paths (#1278)
* allow remote data paths

* add docs about public url

* only allow https

* better docs

* better docs
2024-02-08 15:02:35 -08:00
Wing Lian
1daecd161e copy edits (#1276) 2024-02-08 09:00:04 -05:00
Wing Lian
4a654b331e Add link to axolotl cloud image on latitude (#1275) 2024-02-08 08:50:11 -05:00
Wing Lian
411293bdca contributor avatars (#1269) 2024-02-07 07:09:01 -08:00
Wing Lian
dfd188502a add contact info for dedicated support for axolotl [skip ci] (#1243) 2024-02-01 12:59:07 -05:00
Wing Lian
00568c1539 support for true batches with multipack (#1230)
* support for true batches with multipack

* patch the map dataset fetcher to handle batches with packed indexes

* patch 4d mask creation for sdp attention

* better handling for BetterTransformer

* patch general case for 4d mask

* setup forward patch. WIP

* fix patch file

* support for multipack w/o flash attention for llama

* cleanup

* add warning about bf16 vs fp16 for multipack with sdpa

* bugfixes

* add 4d multipack tests, refactor patches

* update tests and add warnings

* fix e2e file check

* skip sdpa test if not at least torch 2.1.1, update docs
2024-02-01 10:18:42 -05:00
DreamGenX
5787e1a23f Fix and document test_datasets (#1228)
* Make sure test_dataset are used and treat val_set_size.

* Add test_datasets docs.

* Apply suggestions from code review

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-31 06:48:57 -05:00
Wing Lian
4cb7900a56 Peft lotfq (#1222)
* loftq support for lora

* fix loftq check

* update readme for loftq

* readability cleanup

* use peft main for loftq fixes, remove unnecessary special tokens

* remove unused test from older deprecation
2024-01-28 18:50:08 -05:00
mhenrichsen
98b4762077 Feat/chatml add system message (#1117)
* add system message to template

* readme update

* added code to register new system message

* register chatml template for test

---------

Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-25 08:24:27 +01:00
Wing Lian
54d2ac155b Mixtral fixes 20240124 (#1192) [skip ci]
* mixtral nccl fixes

* make sure to patch for z3
2024-01-24 14:59:57 -05:00
Wing Lian
b715cd549a update docs [skip ci] (#1176) 2024-01-23 11:14:52 -05:00
Tilemachos Chatzipapas
cc250391a0 Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155)
* Mistral-7b finetune example using axolotl with code,config,data

* Corrected the path for huggingface dataset

* Update data.jsonl

* chore: lint

---------

Co-authored-by: twenty8th <twenty8th@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-23 07:32:21 -05:00
Ayush Singh
9135b9e2aa Update README.md (#1169) [skip ci]
Fix typo
2024-01-23 07:25:44 -05:00
Wing Lian
782b6a4216 set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci]
* set fp16 to false if bf16, update bf16: auto in example YAMLs

* unset fp16 so that it fallsback properly if bf16 isn't available

* Update README.md [skip-ci]

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* test that bf16 disables fp16

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-01-22 18:44:01 -05:00
Wing Lian
2ce5c0d68a Deprecate max packed sequence len (#1141) 2024-01-20 05:11:50 -05:00
NanoCode012
3db5f2fd17 feat(dataset): add config to keep processed dataset in memory (#1152) 2024-01-20 13:19:28 +09:00
Joe Cummings
08b8ba09a5 Fix link for Minotaur model (#1146) [skip-ci] 2024-01-18 17:22:04 -05:00
Joe Cummings
1d70f24b50 Add shifted sparse attention (#973) [skip-ci]
* Add s2_attn to hijack flash code

* Refactor code to account for s2_attn

* Add test for models utils

* Add ``s2_attention`` option to llama configs

* Add ``s2_attention`` option to README config

* Format code to appease linter

* chore: lint

* Remove xpos and llama-landmark [bad merge]

* add e2e smoke tests for shifted sparse attention

* remove stray patch from merge

* update yml with link to paper for s2_attention/longlora

* fix assertion check for full fine tune

* increase sequence len for tests and PR feedback updates

* reduce context len to 16k for tests

* reduce context len to 16k for tests

* reduce batch size for larger context len and udpate test to check message

* fix test for message

---------

Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-18 10:16:07 -05:00
Wing Lian
ece0211996 Agnostic cloud gpu docker image and Jupyter lab (#1097) 2024-01-15 22:37:54 -05:00
xzuyn
8487b97cf3 Add layers_to_transform for lora_config (#1118) 2024-01-15 21:29:55 -05:00
NanoCode012
9cd27b2f91 fix(readme): clarify custom user prompt [no-ci] (#1124)
* fix(readme): clarify custom user prompt

* chore: update example to show use case of setting field
2024-01-16 09:47:33 +09:00