Commit Graph

968 Commits

Author SHA1 Message Date
Wing Lian
409ca0f21c add support for defined train split (#654) 2023-09-28 20:14:14 -04:00
Wing Lian
8662e8ffe8 don't strip the prompt for check since we don't strip to tokenize anymore (#650) 2023-09-28 12:21:51 -04:00
Wing Lian
b2edaaeff6 fix for flash attn w mistral w/o sammple packing (#648) 2023-09-28 10:57:37 -04:00
Adarsh Shirawalmath
b88f51512a Update mistral/README.md (#647) 2023-09-28 10:24:56 -04:00
NanoCode012
eb41f76f92 Feat: Add example for Mistral (#644)
* Feat: Add example for Mistral

* chore: turn off flash

* chore: add is_mistral_derived_model

* chore: update following PR
2023-09-28 20:15:00 +09:00
NanoCode012
383f88d7a7 Fix(cfg): Add validation for save_strategy and eval_strategy (#633)
* Fix(cfg): Check save_strategy cfg conflict with save_steps

* Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps

* chore: add extra check for steps only
2023-09-28 10:14:41 +09:00
Wing Lian
b6ab8aad62 Mistral flash attn packing (#646)
* add mistral monkeypatch

* add arg for decoder attention masl

* fix lint for duplicate code

* make sure to update transformers too

* tweak install for e2e

* move mistral patch to conditional
2023-09-27 18:41:00 -04:00
Napuh
85b0be2ba7 Warn users to login to HuggingFace (#645)
* added warning if user is not logged in HF

* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Ethan Smith
8fe0e633d2 Fix bug in dataset loading (#284)
* Fix bug in dataset loading

This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`

* Check type of data_files, and load accordingly
2023-09-27 13:41:31 -04:00
Felix Yan
d1236f2c41 Correct typos in datasets.py (#639) 2023-09-27 12:12:10 -04:00
Wing Lian
895f0a0723 skip some flash attn patches unless explicitly enabled (#643)
* skip some flash attn patches if explicitly disabled

* make the other patches optional
2023-09-27 12:11:07 -04:00
Wing Lian
e7d3e2dbb6 use fastchat conversations template (#578)
* use fastchat conversations template

* require fastchat (fschat) pip install

* handle roles dynamically from conversation

* tweak fastchat conversation with a monkeypatch to get individual turns

* fix up so it works with multiple conversation styles, and don't strip the turns

* fix sharegpt fixture now that we're using a more correct tokenization

* use a new prompter and support fastchat conversation type

* use sharegpt from prompt strategies now

* update docs, add chatml template

* add a newline after im_end token

* ensure we correctly set system message

* update per PR feedback to handle deprecated sharegpt types

* don't add duplicate wandb req

* make sharegpt fields configurable from yml

* llama2 fixes

* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
Wing Lian
60c7c48c97 update for recent transformers updates (#636)
* update for recent transformers updates

* fix checkpoint forward kwargs

* just pass args into torch checkpoint
2023-09-27 12:10:32 -04:00
Wing Lian
e8cbf50be6 attention_mask not needed for training (#642)
* attention_mask not needed for training

* specifically don't use attention mask for phi

* use a different check for phi

* small fixes since phi removed some values from their config
2023-09-27 11:12:08 -04:00
Wing Lian
d887ad86c3 eval_table isn't quite stable enough to be in default llama configs (#637) 2023-09-26 10:13:20 -04:00
NanoCode012
19a600a8b8 Feat: Add support for upstream FA2 (#626)
* Feat: Add support for upstream FA2

* chore: add is_falcon_derived_model: true to examples

* chore: add config to readme for documentation

* feat: add extra model types

* fix: remove old falcon flash patch

* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Fernando Tarin Morales
5e5296a77c Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh (#632) 2023-09-25 11:50:14 -04:00
mhenrichsen
f3d939016a Merge pull request #629 from OpenAccess-AI-Collective/chore/-change-default-model
default model changed
2023-09-25 09:32:01 +02:00
NanoCode012
cfbce020e9 Fix: Fail bf16 check when running on cpu during merge (#631) 2023-09-25 13:48:18 +09:00
mhenrichsen
4fecbfe5e1 default model changed 2023-09-24 18:52:53 +02:00
NanoCode012
67b9888630 Feat(doc): Add eval_sample_packing to doc (#625) 2023-09-23 13:11:27 +09:00
Maxime
923eb91304 tweak: improve base builder for smaller layers (#500) 2023-09-22 16:17:50 -04:00
Wing Lian
a363604dcf better handling and logging of empty sharegpt turns (#603) 2023-09-22 16:13:42 -04:00
Wing Lian
501958bb6f create a model card with axolotl badge (#624) 2023-09-22 16:13:26 -04:00
Wing Lian
c25ba7939b update README w deepspeed info (#605) 2023-09-22 00:15:52 -04:00
NanoCode012
d5f8589021 chore(callback): Remove old peft saving code (#510) 2023-09-22 12:31:33 +09:00
Wing Lian
03e59077a0 misc fixes to add gptq tests (#621)
* misc fixes to add gptq tests

* set bf16 needed for fa2
2023-09-21 21:52:12 -04:00
Wing Lian
97d3776ce6 split completion text to sequence_len (#616) 2023-09-21 21:51:25 -04:00
Wing Lian
2844eb22b6 run eval on the first step to get a baseline (#617)
* run eval on the first step to get a baseline

* wandb kleeps getting moved around by pre-commit ...
2023-09-21 21:51:09 -04:00
Wing Lian
e85d2eb06b let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners (#427) 2023-09-21 20:36:30 -04:00
Wing Lian
196ff1181e skip the gpu memory checks if the device is set to 'auto' (#609)
* skip the gpu memory checks if the device is set to 'auto'

* skip gpu mem logging if cpu too

* don't worry about log_gpu_memory_usage since it calls another annotated fn

* rename decorator internal
2023-09-21 15:20:31 -04:00
Wing Lian
92512c390b ignore wandb to resolve isort headaches (#619) 2023-09-21 11:50:09 -04:00
Maxime
2fe95cdcc1 fix distributed devices (#612)
* fix distributed devices

* Update distributed.py

* Update distributed.py
2023-09-21 09:11:34 -04:00
Maxime
c1382e79b6 Create multi-node.md (#613)
* Create multi-node.md

* Update multi-node.md

* Update multi-node.md
2023-09-20 22:02:16 -04:00
Maxime
5d931cc042 Only run tests when a change to python files is made (#614)
* Update tests.yml

* Update .github/workflows/tests.yml

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-09-20 22:02:04 -04:00
Javier
ec0958f4f8 Update requirements.txt (#610) 2023-09-20 08:40:49 -04:00
Wing Lian
faecff9798 support to disable exllama for gptq (#604)
* support to disable exllama for gptq

* update property instead of item

* fix config key
2023-09-19 17:51:08 -04:00
bofeng huang
aa656e04bd Delete duplicate lines (#606) 2023-09-19 16:40:05 -04:00
Wing Lian
b53e77775b update dockerfile to not build evoformer since it fails the build (#607) 2023-09-19 16:28:29 -04:00
Wing Lian
674c57692d more sane defaults for openllama 3b used for quickstarts (#602)
* more sane defaults for openllama 3b used for quickstarts

* don't use bf16 for quickstart to simplify gpu compatibility

* use the update openlm-research/open_llama_3b_v2 models
2023-09-19 09:15:10 -04:00
Wing Lian
1eebbd09c3 improve handling for empty text on the tokenization step (#502) 2023-09-19 08:09:56 -04:00
Wing Lian
62a774140b Fix for check with cfg and merge_lora (#600) 2023-09-18 21:14:32 -04:00
Wing Lian
31b9e0c6e8 minor tweaks to simplify (#597) 2023-09-18 11:45:44 -04:00
Wing Lian
6b9b229356 btlm and falcon monkey patches for flash attn (#566) 2023-09-17 13:49:18 -04:00
Wing Lian
131afdbd89 add bf16 check (#587) 2023-09-17 13:49:03 -04:00
NanoCode012
00dce35fb2 Feat(data): Allow loading local csv and text (#594)
* Feat(data): Allow loading local csv and text

* chore: update readme for loading data
2023-09-17 11:32:27 -04:00
Wing Lian
b15b19eb8d gather/broadcast the max value of the packing efficiency automatically (#463) 2023-09-17 11:08:18 -04:00
Wing Lian
ab534d75ba don't add position_ids for evals (#591) 2023-09-16 16:11:57 -04:00
Wing Lian
21ec195c9f optionally configure sample packing for evals (#589) 2023-09-16 00:09:48 -04:00
Wing Lian
62eaee7649 make phi training work with Loras (#588)
* valdiation for phi loras

* fix model config class check

* update readme for phi traiing
2023-09-15 20:51:55 -04:00