Commit Graph

992 Commits

Author SHA1 Message Date
Wing Lian
aca0398315 apex not needed as amp is part of pytorch (#696) 2023-10-07 12:20:45 -04:00
mhenrichsen
29b8f46aed Merge pull request #693 from OpenAccess-AI-Collective/update-mistral-example
update mistral lr, sample pack
2023-10-07 11:04:58 +02:00
mhenrichsen
83a950bb87 lint 2023-10-07 11:04:35 +02:00
Wing Lian
de87ea68f6 fix multiline for docker (#694) 2023-10-06 22:38:15 -04:00
mhenrichsen
4c8ddf2c6f new lr, sample pack 2023-10-06 22:58:13 +02:00
NanoCode012
669f1d052c Fix: Higher vram usage for mistral and sample_packing (#691)
* Fix: Higher vram usage for mistral and sample_packing

* chore: update comment

* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca Adding qlora config for Mistral (#675)
* Adding qlora config for Mistral

Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to  call tokenizer.padding_side  = 'left' before tokenizing the input.

Fix for now is to set sample_packing: true and pad_to_sequence_len: true

* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00
Wing Lian
2d60ba3a6e flash_attention + sample packing for stablelm 3b (#671)
* stablelm epoch fa patch

* is causal for fa

* working stablelm fa w packing

* chore: pre-commit linting
2023-10-05 16:03:43 -04:00
NanoCode012
eb480dfd68 Fix: ValueError when FA + Mistral when padding_side=right (#681)
* Fix: ValueError when FA + Mistral when padding_side=right

* fix: remove tokenizer class check
2023-10-06 04:12:54 +09:00
NanoCode012
133e676bcc Feat: Set WORKDIR to /workspace/axolotl (#679) 2023-10-06 04:09:14 +09:00
NanoCode012
69fac9a020 Fix: Future deprecation warning with use_auth_token (#680) 2023-10-06 03:56:18 +09:00
NanoCode012
e0b7eeabfd Fix(tokenizer): Set rstrip,lstrip,norm to False (#678) 2023-10-06 03:50:49 +09:00
NanoCode012
43856c0a39 Fix(version): Update FA to work with Mistral SWA (#673) 2023-10-04 21:32:19 +09:00
NanoCode012
e62d5901b5 chore: Clean up repetitive model kwargs (#670) 2023-10-04 20:41:26 +09:00
NanoCode012
697c50d408 Feat: Allow usage of native Mistral FA when no sample_packing (#669)
* Allow usage of native Mistral FA when no sample_packing

* fix: do not apply custom patch when sample_pack off

* chore: lint

* chore: pin transformer to v4.35.0.dev0

* fix: split sample_packing to separate test
2023-10-04 20:40:47 +09:00
NanoCode012
90e0d673f7 Feat: Add config yaml to section for reprod in bug-report.yaml (#667)
* Update bug-report.yaml

* Update bug-report.yaml

* Update bug-report.yaml
2023-10-03 23:38:42 +09:00
Wing Lian
2642caedf2 refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662) 2023-10-02 21:08:07 -04:00
Wing Lian
f34648c8b9 remove patch fix for phi (#664) 2023-10-02 21:07:41 -04:00
Wing Lian
e50a64e85e prepared dataset caching, other misc fixes (#665)
* prepared dataset caching, other misc fixes

* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Wing Lian
f4868d733c make sure we also run CI tests when requirements.txt changes (#663) 2023-10-02 08:43:40 -04:00
Napuh
a7e56d83c2 removed duplicate on requirements.txt (#661) 2023-10-02 08:40:05 -04:00
Wing Lian
5b0bc48fbc add mistral e2e tests (#649)
* mistral e2e tests

* make sure to enable flash attention for the e2e tests

* use latest transformers full sha

* uninstall first
2023-09-29 00:22:40 -04:00
Kyle Corbitt
9ec20777ba Make dataset_processes configurable (#651)
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.

This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd Fix bug when using pretokenized datasets (#652)
* fix pretokenized datasets readme

* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c add support for defined train split (#654) 2023-09-28 20:14:14 -04:00
Wing Lian
8662e8ffe8 don't strip the prompt for check since we don't strip to tokenize anymore (#650) 2023-09-28 12:21:51 -04:00
Wing Lian
b2edaaeff6 fix for flash attn w mistral w/o sammple packing (#648) 2023-09-28 10:57:37 -04:00
Adarsh Shirawalmath
b88f51512a Update mistral/README.md (#647) 2023-09-28 10:24:56 -04:00
NanoCode012
eb41f76f92 Feat: Add example for Mistral (#644)
* Feat: Add example for Mistral

* chore: turn off flash

* chore: add is_mistral_derived_model

* chore: update following PR
2023-09-28 20:15:00 +09:00
NanoCode012
383f88d7a7 Fix(cfg): Add validation for save_strategy and eval_strategy (#633)
* Fix(cfg): Check save_strategy cfg conflict with save_steps

* Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps

* chore: add extra check for steps only
2023-09-28 10:14:41 +09:00
Wing Lian
b6ab8aad62 Mistral flash attn packing (#646)
* add mistral monkeypatch

* add arg for decoder attention masl

* fix lint for duplicate code

* make sure to update transformers too

* tweak install for e2e

* move mistral patch to conditional
2023-09-27 18:41:00 -04:00
Napuh
85b0be2ba7 Warn users to login to HuggingFace (#645)
* added warning if user is not logged in HF

* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Ethan Smith
8fe0e633d2 Fix bug in dataset loading (#284)
* Fix bug in dataset loading

This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`

* Check type of data_files, and load accordingly
2023-09-27 13:41:31 -04:00
Felix Yan
d1236f2c41 Correct typos in datasets.py (#639) 2023-09-27 12:12:10 -04:00
Wing Lian
895f0a0723 skip some flash attn patches unless explicitly enabled (#643)
* skip some flash attn patches if explicitly disabled

* make the other patches optional
2023-09-27 12:11:07 -04:00
Wing Lian
e7d3e2dbb6 use fastchat conversations template (#578)
* use fastchat conversations template

* require fastchat (fschat) pip install

* handle roles dynamically from conversation

* tweak fastchat conversation with a monkeypatch to get individual turns

* fix up so it works with multiple conversation styles, and don't strip the turns

* fix sharegpt fixture now that we're using a more correct tokenization

* use a new prompter and support fastchat conversation type

* use sharegpt from prompt strategies now

* update docs, add chatml template

* add a newline after im_end token

* ensure we correctly set system message

* update per PR feedback to handle deprecated sharegpt types

* don't add duplicate wandb req

* make sharegpt fields configurable from yml

* llama2 fixes

* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
Wing Lian
60c7c48c97 update for recent transformers updates (#636)
* update for recent transformers updates

* fix checkpoint forward kwargs

* just pass args into torch checkpoint
2023-09-27 12:10:32 -04:00
Wing Lian
e8cbf50be6 attention_mask not needed for training (#642)
* attention_mask not needed for training

* specifically don't use attention mask for phi

* use a different check for phi

* small fixes since phi removed some values from their config
2023-09-27 11:12:08 -04:00
Wing Lian
d887ad86c3 eval_table isn't quite stable enough to be in default llama configs (#637) 2023-09-26 10:13:20 -04:00
NanoCode012
19a600a8b8 Feat: Add support for upstream FA2 (#626)
* Feat: Add support for upstream FA2

* chore: add is_falcon_derived_model: true to examples

* chore: add config to readme for documentation

* feat: add extra model types

* fix: remove old falcon flash patch

* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Fernando Tarin Morales
5e5296a77c Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh (#632) 2023-09-25 11:50:14 -04:00
mhenrichsen
f3d939016a Merge pull request #629 from OpenAccess-AI-Collective/chore/-change-default-model
default model changed
2023-09-25 09:32:01 +02:00
NanoCode012
cfbce020e9 Fix: Fail bf16 check when running on cpu during merge (#631) 2023-09-25 13:48:18 +09:00
mhenrichsen
4fecbfe5e1 default model changed 2023-09-24 18:52:53 +02:00
NanoCode012
67b9888630 Feat(doc): Add eval_sample_packing to doc (#625) 2023-09-23 13:11:27 +09:00
Maxime
923eb91304 tweak: improve base builder for smaller layers (#500) 2023-09-22 16:17:50 -04:00
Wing Lian
a363604dcf better handling and logging of empty sharegpt turns (#603) 2023-09-22 16:13:42 -04:00
Wing Lian
501958bb6f create a model card with axolotl badge (#624) 2023-09-22 16:13:26 -04:00
Wing Lian
c25ba7939b update README w deepspeed info (#605) 2023-09-22 00:15:52 -04:00
NanoCode012
d5f8589021 chore(callback): Remove old peft saving code (#510) 2023-09-22 12:31:33 +09:00