Compare commits

...

133 Commits

Author SHA1 Message Date
Wing Lian
6fcb73faaa more gpt-neox long ctx fixes
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-06-01 08:20:08 -04:00
Wing Lian
a32cc1d021 fix bettertransformers save, force it to skip after saving correctly in callback 2023-06-01 00:33:13 -04:00
Wing Lian
86bd9fcff4 more tweaks to do pre-training with bettertransformers 2023-05-31 21:59:15 -04:00
Wing Lian
ed7531abb8 experimental expansion of ctx len 2023-05-31 16:51:19 -04:00
Wing Lian
bdb547b830 add validation/warning for bettertransformers and torch version 2023-05-31 16:41:24 -04:00
Wing Lian
8a37b43678 use pythia-12b, neox-20b is flaky 2023-05-31 16:41:21 -04:00
Wing Lian
28acebac36 add flash attn context for efficient training and attempt setting model to train mode: 2023-05-31 16:40:38 -04:00
Wing Lian
adea682316 add support for opimum bettertransformers 2023-05-31 16:39:35 -04:00
Wing Lian
a6f5e5eaec Merge pull request #134 from OpenAccess-AI-Collective/gas-batch-fix
fix batch size calculation
2023-05-31 14:24:48 -04:00
Wing Lian
5a631b305b fix batch size calculation 2023-05-31 14:11:32 -04:00
Wing Lian
f94dd626f0 Merge pull request #130 from OpenAccess-AI-Collective/gas
swap batch size for gradient accumulation steps to decouple from num gpu
2023-05-31 13:03:51 -04:00
Wing Lian
5079753b7a Merge pull request #131 from OpenAccess-AI-Collective/fix-packing-mask
fix packing so that concatenated sequences reset the attention
2023-05-31 13:03:37 -04:00
Wing Lian
0136f510f2 don't worry about duplicate code here 2023-05-31 12:05:43 -04:00
Wing Lian
9b8585dc70 fix packing so that concatenated sequences reset the attention 2023-05-31 11:38:52 -04:00
Wing Lian
8eb5811d4e Merge pull request #129 from OpenAccess-AI-Collective/builder-badge
add badge info to readme
2023-05-31 10:37:59 -04:00
Wing Lian
e0011fdf55 Fix base builder, missing tags 2023-05-31 09:52:03 -04:00
Wing Lian
6e9e98720e Merge pull request #127 from OpenAccess-AI-Collective/py310-docker-runpod
add py310 support from base image
2023-05-31 09:39:42 -04:00
Wing Lian
c2a0792680 swap batch size for gradient accumulation steps to decouple from num gpu 2023-05-31 09:38:12 -04:00
Wing Lian
b267d24a2b add badge info to readme 2023-05-31 09:28:44 -04:00
Wing Lian
5c3f5db38b Add files via upload 2023-05-31 09:22:54 -04:00
Wing Lian
e3d03745ba add py310 support from base image 2023-05-31 09:07:28 -04:00
NanoCode012
fac46002d4 Merge pull request #119 from NanoCode012/feat/update-inference
Feat(inference): Swap to GenerationConfig
2023-05-31 14:09:18 +09:00
NanoCode012
33d40179ba Increase max_new_tokens
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-05-31 14:04:49 +09:00
Wing Lian
dcb03d6da4 Merge pull request #114 from OpenAccess-AI-Collective/accelerate-dep
Add accelerate dep
2023-05-31 00:47:17 -04:00
NanoCode012
0e4be625ae Merge pull request #118 from NanoCode012/feat/torch-readme
Fix(readme): Fix torch missing from readme
2023-05-31 13:29:41 +09:00
NanoCode012
bdc4bd7d4e Update README.md 2023-05-31 13:24:28 +09:00
Wing Lian
2d0ba3b818 Merge pull request #124 from OpenAccess-AI-Collective/xformers-fix
copy xformers attn from ooba since we removed dep on alpaca_lora_4bit
2023-05-31 00:11:40 -04:00
Wing Lian
c7021e191f Merge pull request #120 from OpenAccess-AI-Collective/model-from-path
split up llama model loading so config can be loaded from base config and models can be loaded from a path
2023-05-31 00:08:38 -04:00
Wing Lian
c56818b119 don't worry about dupes 2023-05-31 00:06:47 -04:00
Wing Lian
2675fb756e update readme for SDP 2023-05-31 00:04:54 -04:00
Wing Lian
1076bcbbca Update src/axolotl/monkeypatch/llama_attn_hijack_xformers.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2023-05-31 00:00:19 -04:00
Wing Lian
2daa6835f0 Update src/axolotl/monkeypatch/llama_attn_hijack_xformers.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2023-05-30 23:59:05 -04:00
Wing Lian
e3c494ca7b remove unused import and update readme 2023-05-30 23:55:45 -04:00
Wing Lian
ad0ea6aaab black formatting
ignore copied file
fix linting
2023-05-30 23:50:29 -04:00
Wing Lian
876edd83d0 Merge pull request #123 from OpenAccess-AI-Collective/bas-batch
add support for gradient accumulation steps
2023-05-30 23:45:29 -04:00
Wing Lian
6cb2310592 copy xformers attn from ooba since we removed dep on alpaca_lora_4bit 2023-05-30 23:34:36 -04:00
Wing Lian
6fa40bf8ad black formatting 2023-05-30 23:33:37 -04:00
Wing Lian
3aad5f3b3e add support for gradient accumulation steps 2023-05-30 23:24:37 -04:00
Wing Lian
39a208c2bc fix up tokenizer config, isort fix 2023-05-30 23:00:02 -04:00
Wing Lian
2520ecd6df split up llama model loading so config can be loaded from base config and models can be loaded from a path 2023-05-30 22:32:44 -04:00
Wing Lian
c5b0af1a7e define python version (3.10) explicitly as string in yaml 2023-05-30 22:23:35 -04:00
NanoCode012
988aeb9c34 Feat: Swap to GenerationConfig 2023-05-31 10:48:19 +09:00
NanoCode012
cf61f14bff FIx(readme): Fix torch missing from readme 2023-05-31 10:28:49 +09:00
Wing Lian
0abcd71a85 Merge pull request #115 from OpenAccess-AI-Collective/docker-version-fixes
docker fixes: py310, fix cuda arg in deepspeed
2023-05-30 18:11:26 -04:00
Wing Lian
c43c5c84ff py310, fix cuda arg in deepspeed 2023-05-30 18:02:34 -04:00
Wing Lian
36ec6e1a0e Add accelerate dep 2023-05-30 16:36:13 -04:00
Wing Lian
13b80937f9 add release draft template for gh
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-05-30 15:10:19 -04:00
Wing Lian
bbc5bc5791 Merge pull request #108 from OpenAccess-AI-Collective/docker-gptq
default to qlora support, make gptq specific image
2023-05-30 15:07:04 -04:00
Wing Lian
4df9da74e3 Merge pull request #105 from viktoriussuwandi/viktoriussuwandi-patch
Viktoriussuwandi patch
2023-05-30 15:05:23 -04:00
Wing Lian
2531ea24c1 Merge pull request #106 from fearnworks/qlora-openllama-3b-example
Qlora openllama 3b example
2023-05-30 15:05:05 -04:00
Wing Lian
01a75fd027 Merge pull request #98 from NanoCode012/feat/pre-commit
Add pre-commit: black+flake8+pylint+mypy+isort+bandit
2023-05-30 14:57:15 -04:00
NanoCode012
b81c97ff76 Fix pre-commit for rebased files 2023-05-31 03:01:38 +09:00
NanoCode012
594e72b6e8 Fix incorrect rebase 2023-05-31 02:58:50 +09:00
NanoCode012
25eeeeba0b Fix sharegpt prompt 2023-05-31 02:55:21 +09:00
Wing Lian
cfcc549f6b fix relative path for fixtures 2023-05-31 02:55:21 +09:00
NanoCode012
a1f9850b91 Fix security issue or ignore false positives 2023-05-31 02:53:53 +09:00
NanoCode012
83d29209f7 Add bandit 2023-05-31 02:53:53 +09:00
NanoCode012
d011422200 Add isort 2023-05-31 02:53:53 +09:00
NanoCode012
b1cc54b14a Update pip install to also setup tests 2023-05-31 02:53:53 +09:00
NanoCode012
c17dae6d07 Update src/axolotl/prompt_strategies/alpaca_instruct.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-05-31 02:53:53 +09:00
NanoCode012
37293dce07 Apply isort then black 2023-05-31 02:53:53 +09:00
NanoCode012
96e8378692 Delete extract_lora.py 2023-05-31 02:53:53 +09:00
NanoCode012
e9650d3ae4 Fix mypy typing 2023-05-31 02:53:53 +09:00
NanoCode012
f1232b35ba Update mypy dependencies 2023-05-31 02:53:53 +09:00
NanoCode012
741a3f2edc Add mypy 2023-05-31 02:53:53 +09:00
NanoCode012
0dd35c74af Ignore unsupported-binary-operation 2023-05-31 02:53:53 +09:00
NanoCode012
db288e9b13 Set python version 2023-05-31 02:53:53 +09:00
NanoCode012
be22551435 Fix unsupported operand type(s) for | 2023-05-31 02:53:53 +09:00
NanoCode012
b832a0ac62 Black formatting 2023-05-31 02:53:53 +09:00
NanoCode012
afb31e13a3 Add badge and update contribution section 2023-05-31 02:53:53 +09:00
NanoCode012
1bf1f59a41 Move black to dev requirements 2023-05-31 02:53:53 +09:00
NanoCode012
8e46c0fb0d Refactor duplicate code between Prompter and Pygmalion 2023-05-31 02:53:53 +09:00
NanoCode012
1f3c3f5ea0 Lint validation 2023-05-31 02:53:53 +09:00
NanoCode012
0e952889dc Lint test_dict 2023-05-31 02:53:53 +09:00
NanoCode012
9c6750a075 Lint wandb 2023-05-31 02:53:53 +09:00
NanoCode012
c2dbf2c526 Lint validation 2023-05-31 02:53:53 +09:00
NanoCode012
e6b57decbd Lint tokenization 2023-05-31 02:53:53 +09:00
NanoCode012
fe1f4c4e7d Lint schedulers 2023-05-31 02:53:53 +09:00
NanoCode012
dae14e5951 Ignore too-many-instance-attributes 2023-05-31 02:53:53 +09:00
NanoCode012
633ff2150f Lint dict 2023-05-31 02:53:53 +09:00
NanoCode012
5d86137f70 Lint prompt_tokenizers 2023-05-31 02:53:53 +09:00
NanoCode012
01c8a333b3 Lint pygmalion 2023-05-31 02:53:53 +09:00
NanoCode012
7eb33a77dd Lint test_prompters 2023-05-31 02:53:53 +09:00
NanoCode012
1645a4ddd5 Lint creative_acr 2023-05-31 02:53:53 +09:00
NanoCode012
145b060cbe Lint alpaca_instruct 2023-05-31 02:53:53 +09:00
NanoCode012
8cc0aadcb8 Lint alpaca_chat 2023-05-31 02:53:53 +09:00
NanoCode012
6abb7f6a16 Lint datasets 2023-05-31 02:53:53 +09:00
NanoCode012
de2406c488 Lint convert.py 2023-05-31 02:53:53 +09:00
NanoCode012
8b617cc7f6 Lint setup.py 2023-05-31 02:53:53 +09:00
NanoCode012
ddb86ea821 Lint trainer.py 2023-05-31 02:53:53 +09:00
NanoCode012
1a2bd7ff62 Ignore too-few-public-methods 2023-05-31 02:53:23 +09:00
NanoCode012
82971e1565 Lint finetune.py 2023-05-31 02:53:23 +09:00
NanoCode012
f4e5d86268 Lint models.py 2023-05-31 02:53:23 +09:00
NanoCode012
daf47ccf45 Refactor disable pylint 2023-05-31 02:53:23 +09:00
NanoCode012
545cfeb5c7 Refactor error code to use full error message 2023-05-31 02:53:23 +09:00
NanoCode012
69722aeef4 Remove fixme disable 2023-05-31 02:53:23 +09:00
NanoCode012
5658717dbd Remove disable too many arg 2023-05-31 02:53:23 +09:00
NanoCode012
e8717d3bef Remove disable 2023-05-31 02:53:23 +09:00
NanoCode012
54c3b5b25f Ignore too-many-arguments 2023-05-31 02:53:23 +09:00
NanoCode012
5062eca069 Lint callbacks.py 2023-05-31 02:53:23 +09:00
NanoCode012
cb4f0e9342 Lint prompters.py 2023-05-31 02:53:23 +09:00
NanoCode012
4c0eddb3f8 Refactor 2023-05-31 02:53:23 +09:00
NanoCode012
1c60c10e00 Lint flash_attn.py 2023-05-31 02:53:23 +09:00
NanoCode012
903ea3080d Fix lint 2023-05-31 02:53:23 +09:00
NanoCode012
cb7cd3429f Fix data.py lint 2023-05-31 02:53:23 +09:00
NanoCode012
d57ba56746 Ignore import and too many * pylint errors 2023-05-31 02:53:23 +09:00
NanoCode012
c3a4697016 Update ignores 2023-05-31 02:53:22 +09:00
NanoCode012
392dfd9b07 Lint and format 2023-05-31 02:53:22 +09:00
NanoCode012
a98deb31a6 Add config files 2023-05-31 02:53:22 +09:00
NanoCode012
36596adaf7 Add pre-commit: black+flake8+pylint 2023-05-31 02:53:22 +09:00
jphillips
6cee881d64 Update examples/qlora-openllama-3b/README.md
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-05-30 09:33:33 -05:00
Wing Lian
48612f8376 cleanup from pr feedback 2023-05-30 09:56:30 -04:00
Wing Lian
d91a769b88 update docs 2023-05-29 20:37:32 -04:00
Wing Lian
6ef96f569b default to qlora support, make gptq specific image 2023-05-29 20:34:41 -04:00
jphillips
ac85c0ed36 Add Readme, Clean up comments 2023-05-29 14:35:58 -05:00
jphillips
f1fbf666f7 Merge branch 'main' of https://github.com/OpenAccess-AI-Collective/axolotl into qlora-openllama-3b-example 2023-05-29 09:09:43 -05:00
jphillips
370d057096 Add qlora-openllama-3b example 2023-05-29 09:07:46 -05:00
Viktorius Suwandi
e0ccaccce2 Update wandb_log_model on vicuna_13B_4bit_reflect.yml 2023-05-29 16:34:13 +07:00
Viktorius Suwandi
15e57ba6ee Update wandb_log_model on config.yml 2023-05-29 16:33:20 +07:00
Viktorius Suwandi
4eb68ac3f7 Update wandb_log_model on config-3b.yml 2023-05-29 16:32:49 +07:00
Viktorius Suwandi
b6a539b53c Update wandb_log_model on cerebras_1_3B_alpaca.yml 2023-05-29 16:32:20 +07:00
Viktorius Suwandi
abddcf4dfe Update wandb_log_model on pythia_1_2B_alpaca.yml 2023-05-29 16:31:53 +07:00
Viktorius Suwandi
15aabd2903 Update wandb_log_model on llama_7B_jeopardy.yml 2023-05-29 15:44:01 +07:00
Viktorius Suwandi
232b931081 Update wandb_log_model on llama_65B_alpaca.yml 2023-05-29 15:43:43 +07:00
Viktorius Suwandi
0736f4f9c1 Update wandb_log_model on llama_13B_alpaca.yml 2023-05-29 15:43:20 +07:00
Viktorius Suwandi
d77d736631 Update wandb_log_model on llama_7B_alpaca.yml 2023-05-29 15:43:01 +07:00
Viktorius Suwandi
fad06befee Update wandb_log_model on config.yml 2023-05-29 15:42:38 +07:00
Viktorius Suwandi
2aacf75ee1 Update wandb_log_model on galactica_1_3B.yml 2023-05-29 15:42:19 +07:00
Viktorius Suwandi
71871345a6 Update wandb_log_model on llama_7B_4bit.yml 2023-05-29 15:41:59 +07:00
Viktorius Suwandi
0d14e951a8 Update wandb_log_model on stability_3b.yml 2023-05-29 15:41:42 +07:00
Viktorius Suwandi
84fc217f79 Update wandb_log_model on gpt_neox_20b.yml 2023-05-29 15:41:24 +07:00
Viktorius Suwandi
f317296259 Update wandb_log_model on quickstart.yml 2023-05-29 15:40:58 +07:00
Viktorius Suwandi
42a971df32 Update wandb_log_model on sample.yml 2023-05-29 15:39:42 +07:00
70 changed files with 1756 additions and 569 deletions

3
.bandit Normal file
View File

@@ -0,0 +1,3 @@
[bandit]
exclude = tests
skips = B101

5
.flake8 Normal file
View File

@@ -0,0 +1,5 @@
[flake8]
max-line-length = 88
select = C,E,F,W,B,B950
extend-ignore = E203, E501, W503

31
.github/release-drafter.yml vendored Normal file
View File

@@ -0,0 +1,31 @@
name-template: 'v$RESOLVED_VERSION'
tag-template: 'v$RESOLVED_VERSION'
categories:
- title: '🚀 Features'
labels:
- 'feature'
- 'enhancement'
- title: '🐛 Bug Fixes'
labels:
- 'fix'
- 'bugfix'
- 'bug'
- title: '🧰 Maintenance'
label: 'chore'
change-template: '- $TITLE @$AUTHOR (#$NUMBER)'
change-title-escapes: '\<*_&' # You can add # and @ to disable mentions, and add ` to disable code blocks.
version-resolver:
major:
labels:
- 'major'
minor:
labels:
- 'minor'
patch:
labels:
- 'patch'
default: patch
template: |
## Whats Changed
$CHANGES

View File

@@ -14,14 +14,26 @@ jobs:
strategy:
matrix:
include:
- cuda: cu118
- cuda: "118"
cuda_version: 11.8.0
cuda_version_bnb: "118"
python_version: "3.9"
pytorch: 2.0.0
- cuda: cu117
axolotl_extras:
- cuda: "118"
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.0.0
axolotl_extras:
- cuda: "117"
cuda_version: 11.7.0
cuda_version_bnb: "117"
python_version: "3.9"
pytorch: 1.13.1
axolotl_extras:
- cuda: "118"
cuda_version: 11.8.0
python_version: "3.9"
pytorch: 2.0.0
axolotl_extras: gptq
steps:
- name: Checkout
uses: actions/checkout@v3
@@ -43,12 +55,13 @@ jobs:
context: .
file: ./docker/Dockerfile-base
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.metadata.outputs.tags }}-${{ matrix.cuda }}-${{ matrix.pytorch }}
tags: ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
labels: ${{ steps.metadata.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
CUDA_VERSION=${{ matrix.cuda_version }}
CUDA_VERSION_BNB=${{ matrix.cuda_version_bnb }}
CUDA=${{ matrix.cuda }}
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch }}
AXOLOTL_EXTRAS=${{ matrix.axolotl_extras }}

View File

@@ -15,10 +15,24 @@ jobs:
include:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.9"
pytorch: 2.0.0
axolotl_extras:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.0.0
axolotl_extras:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.9"
pytorch: 2.0.0
axolotl_extras: gptq
- cuda: cu117
cuda_version: 11.7.0
python_version: "3.9"
pytorch: 1.13.1
axolotl_extras:
runs-on: self-hosted
steps:
- name: Checkout
@@ -40,10 +54,10 @@ jobs:
with:
context: .
build-args: |
BASE_TAG=${{ github.ref_name }}-base-${{ matrix.cuda }}-${{ matrix.pytorch }}
BASE_TAG=${{ github.ref_name }}-base-py${{ matrix.python_version }}-${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
file: ./docker/Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.metadata.outputs.tags }}-${{ matrix.cuda }}-${{ matrix.pytorch }}
tags: ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
labels: ${{ steps.metadata.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
@@ -56,10 +70,24 @@ jobs:
include:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.9"
pytorch: 2.0.0
axolotl_extras:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.0.0
axolotl_extras:
- cuda: cu118
cuda_version: 11.8.0
python_version: "3.9"
pytorch: 2.0.0
axolotl_extras: gptq
- cuda: cu117
cuda_version: 11.7.0
python_version: "3.9"
pytorch: 1.13.1
axolotl_extras:
runs-on: self-hosted
steps:
- name: Checkout
@@ -81,10 +109,10 @@ jobs:
with:
context: .
build-args: |
BASE_TAG=${{ github.ref_name }}-${{ matrix.cuda }}-${{ matrix.pytorch }}
BASE_TAG=${{ github.ref_name }}-py${{ matrix.python_version }}-${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
file: ./docker/Dockerfile-runpod
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.metadata.outputs.tags }}-${{ matrix.cuda }}-${{ matrix.pytorch }}
tags: ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
labels: ${{ steps.metadata.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

16
.github/workflows/pre-commit.yml vendored Normal file
View File

@@ -0,0 +1,16 @@
name: pre-commit
on:
pull_request:
push:
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.9"
cache: 'pip' # caching pip dependencies
- uses: pre-commit/action@v3.0.0

2
.gitignore vendored
View File

@@ -160,4 +160,4 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
.idea/

2
.isort.cfg Normal file
View File

@@ -0,0 +1,2 @@
[settings]
profile=black

39
.mypy.ini Normal file
View File

@@ -0,0 +1,39 @@
[mypy]
exclude = venv
[mypy-alpaca_lora_4bit.*]
ignore_missing_imports = True
[mypy-axolotl.monkeypatch.*]
ignore_errors = True
[mypy-flash_attn.*]
ignore_missing_imports = True
[mypy-huggingface_hub]
ignore_missing_imports = True
[mypy-transformers.*]
ignore_missing_imports = True
[mypy-peft]
ignore_missing_imports = True
[mypy-bitsandbytes]
ignore_missing_imports = True
[mypy-datasets]
ignore_missing_imports = True
[mypy-fire]
ignore_missing_imports = True
[mypy-setuptools]
ignore_missing_imports = True
[mypy-addict]
ignore_missing_imports = True
[mypy-xformers.*]
ignore_missing_imports = True

42
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,42 @@
default_language_version:
python: python3.9
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/PyCQA/pylint
rev: v2.17.4
hooks:
- id: pylint
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.3.0
hooks:
- id: mypy
additional_dependencies:
[
'types-PyYAML',
]
- repo: https://github.com/PyCQA/bandit
rev: 1.7.5
hooks:
- id: bandit
args: [
'--ini',
'.bandit',
]

14
.pylintrc Normal file
View File

@@ -0,0 +1,14 @@
[MASTER]
init-hook="from pylint.config import find_pylintrc; import os, sys; sys.path.append(os.path.dirname(find_pylintrc()))"
[TYPECHECK]
# List of members which are set dynamically and missed by Pylint inference
# system, and so shouldn't trigger E1101 when accessed.
generated-members=numpy.*, torch.*
[pylint.messages_control]
disable=missing-function-docstring, line-too-long, import-error,
too-many-arguments, too-many-locals, too-many-statements, too-many-branches, too-few-public-methods,
too-many-instance-attributes, fixme, import-outside-toplevel, logging-fstring-interpolation,

View File

@@ -9,6 +9,8 @@
<p>
Go ahead and axolotl questions!!
</p>
<img src="https://github.com/OpenAccess-AI-Collective/axolotl/actions/workflows/pre-commit.yml/badge.svg?branch=main" alt="pre-commit">
<img alt="PyTest Status" src="https://github.com/OpenAccess-AI-Collective/axolotl/actions/workflows/tests.yml/badge.svg?branch=main">
</div>
</div>
@@ -25,12 +27,12 @@
## Quickstart ⚡
**Requirements**: Python 3.9.
**Requirements**: Python 3.9 and Pytorch 2.0.
```bash
git clone https://github.com/OpenAccess-AI-Collective/axolotl
pip3 install -e .[int4]
pip3 install -e .
accelerate config
@@ -56,10 +58,12 @@ accelerate launch scripts/finetune.py examples/lora-openllama-3b/config.yml \
- Conda/Pip venv
1. Install python **3.9**
2. Install python dependencies with ONE of the following:
- `pip3 install -e .[int4]` (recommended)
- `pip3 install -e .[int4_triton]`
- `pip3 install -e .`
2. Install pytorch stable https://pytorch.org/get-started/locally/
3. Install python dependencies with ONE of the following:
- `pip3 install -e .` (recommended, supports QLoRA, no gptq/int4 support)
- `pip3 install -e .[gptq]` (next best if you don't need QLoRA, but want to use gptq)
- `pip3 install -e .[gptq_triton]`
### Dataset
@@ -169,6 +173,9 @@ base_model_ignore_patterns:
# if the base_model repo on hf hub doesn't include configuration .json files,
# you can set that here, or leave this empty to default to base_model
base_model_config: ./llama-7b-hf
# Optional tokenizer configuration override in case you want to use a different tokenizer
# than the one defined in the base model
tokenizer_config:
# If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
model_type: AutoModelForCausalLM
# Corresponding tokenizer for the model AutoTokenizer is a good choice
@@ -258,7 +265,7 @@ wandb_log_model: # 'checkpoint'
output_dir: ./completed-model
# training hyperparameters
batch_size: 8
gradient_accumulation_steps: 1
micro_batch_size: 2
eval_batch_size: 2
num_epochs: 3
@@ -298,6 +305,9 @@ weight_decay:
xformers_attention:
# whether to use flash attention patch https://github.com/HazyResearch/flash-attention:
flash_attention: # require a100 for llama
# whether to use scaled-dot-product attention
# https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
sdp_attention:
# resume from a specific checkpoint dir
resume_from_checkpoint:
@@ -401,8 +411,27 @@ Try to turn off xformers.
Join our [Discord server](https://discord.gg/HhrNrHJPRb) where we can help you
## Badge ❤🏷️
Building something cool with Axolotl? Consider adding a badge to your model card.
```markdown
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
```
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
## Contributing 🤝
Bugs? Please check for open issue else create a new [Issue](https://github.com/OpenAccess-AI-Collective/axolotl/issues/new).
PRs are **greatly welcome**!
Please run below to setup env
```bash
pip3 install -r requirements-dev.txt -r requirements-tests.txt
pre-commit install
# test
pytest tests/
```

View File

@@ -24,9 +24,9 @@ lora_fan_in_fan_out: false
wandb_project: pythia-1.4b-lora
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-alpaca
batch_size: 32
gradient_accumulation_steps: 1
micro_batch_size: 4
num_epochs: 5
learning_rate: 0.0003

View File

@@ -21,9 +21,9 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-llama-alpaca
batch_size: 32
gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 3
learning_rate: 0.00003

View File

@@ -1,39 +0,0 @@
base_model: EleutherAI/gpt-neox-20b
base_model_ignore_patterns: pytorch* # prefer safetensors
model_type: GPTNeoXForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: true
datasets:
- path: nomic-ai/gpt4all-j-prompt-generations
type: alpaca
shards: 4
shards_index: 0
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
adapter: lora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 2048
lora_r: 8
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- query_key_value
lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
wandb_project: gpt4all-neox-20b
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./gpt4all-neox-20b
batch_size: 48
micro_batch_size: 4
num_epochs: 5
learning_rate: 0.00003
lr_scheduler: one_cycle
train_on_inputs: false
group_by_length: false
bf16: True
tf32: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:

View File

@@ -21,9 +21,9 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./llama-13b-sharegpt
batch_size: 64
gradient_accumulation_steps: 1
micro_batch_size: 2
warmup_steps: 1000
save_steps:

View File

@@ -27,9 +27,9 @@ lora_fan_in_fan_out: false
wandb_project: llama-65b-lora
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-llama-alpaca
batch_size: 128
gradient_accumulation_steps: 1
micro_batch_size: 16
warmup_steps: 1000
save_steps:

View File

@@ -24,9 +24,9 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-test
batch_size: 8
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 3
warmup_steps: 100

View File

@@ -26,9 +26,9 @@ lora_fan_in_fan_out: false
wandb_project: llama-7b-lora
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-llama-alpaca
batch_size: 128
gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 5
learning_rate: 0.00003

View File

@@ -22,9 +22,9 @@ lora_fan_in_fan_out: false
wandb_project: jeopardy-bot-7b
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./jeopardy-bot-7b
batch_size: 4
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit

View File

@@ -26,9 +26,9 @@ lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
wandb_project: pythia-1.4b-lora
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-alpaca
batch_size: 48
gradient_accumulation_steps: 1
micro_batch_size: 4
num_epochs: 5
learning_rate: 0.00001

View File

@@ -24,9 +24,9 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-test
batch_size: 4
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 3
warmup_steps: 100

View File

@@ -49,11 +49,12 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
# where to save the finsihed model to
output_dir: ./completed-model
# training hyperparameters
batch_size: 8
gradient_accumulation_steps: 1
batch_size:
micro_batch_size: 2
num_epochs: 3
warmup_steps: 100

View File

@@ -20,9 +20,9 @@ lora_fan_in_fan_out: false
wandb_project: stable-alpaca-3b
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./stable-alpaca-3b
batch_size: 2
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit

View File

@@ -28,9 +28,9 @@ lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./lora-reflect
batch_size: 8
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 3
learning_rate: 0.00003

View File

@@ -2,19 +2,26 @@ ARG BASE_TAG=main-base
FROM winglian/axolotl-base:$BASE_TAG
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
ARG AXOLOTL_EXTRAS=""
RUN apt-get update && \
apt-get install -y vim curl
WORKDIR /workspace
# The base image ships with `pydantic==1.8.2` which is not working
RUN python3 -m pip install -U --no-cache-dir pydantic
RUN pip3 install --force-reinstall "peft @ git+https://github.com/huggingface/peft.git@main" \
"accelerate @ git+https://github.com/huggingface/accelerate.git@main" \
"transformers @ git+https://github.com/huggingface/transformers.git@main"
RUN mkdir axolotl
COPY . axolotl/
# If AXOLOTL_EXTRAS is set, append it in brackets
RUN cd axolotl && \
pip install -e .[int4]
if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
pip install -e .[$AXOLOTL_EXTRAS]; \
else \
pip install -e .; \
fi
# helper for huggingface-login cli
RUN git config --global credential.helper store

View File

@@ -9,7 +9,7 @@ ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PYTHON_VERSION="3.9"
ARG PYTORCH="2.0.0"
ARG CUDA="cu118"
ARG CUDA="118"
ENV PYTHON_VERSION=$PYTHON_VERSION
@@ -29,7 +29,7 @@ ENV PATH="/root/miniconda3/envs/py${PYTHON_VERSION}/bin:${PATH}"
WORKDIR /workspace
RUN python3 -m pip install --upgrade pip && pip3 install packaging && \
python3 -m pip install --no-cache-dir -U torch==${PYTORCH} torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
python3 -m pip install --no-cache-dir -U torch==${PYTORCH} torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu$CUDA
FROM base-builder AS flash-attn-builder
@@ -52,6 +52,8 @@ RUN git clone https://github.com/HazyResearch/flash-attention.git && \
FROM base-builder AS deepspeed-builder
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
WORKDIR /workspace
RUN git clone https://github.com/microsoft/DeepSpeed.git && \
@@ -61,12 +63,12 @@ RUN git clone https://github.com/microsoft/DeepSpeed.git && \
FROM base-builder AS bnb-builder
WORKDIR /workspace
ARG CUDA_VERSION_BNB="118"
ENV CUDA_VERSION_BNB=$CUDA_VERSION_BNB
ARG CUDA="118"
ENV CUDA=$CUDA
RUN git clone https://github.com/TimDettmers/bitsandbytes.git && \
cd bitsandbytes && \
CUDA_VERSION=$CUDA_VERSION_BNB make cuda11x && \
CUDA_VERSION=$CUDA make cuda11x && \
python setup.py bdist_wheel
FROM base-builder
@@ -93,10 +95,6 @@ COPY --from=flash-attn-builder /workspace/flash-attention/csrc/layer_norm/dist/d
RUN pip3 install wheels/deepspeed-*.whl wheels/flash_attn-*.whl wheels/fused_dense_lib-*.whl wheels/xentropy_cuda_lib-*.whl wheels/rotary_emb-*.whl wheels/dropout_layer_norm-*.whl
RUN cd /workspace/builds/bitsandbytes && python3 setup.py install
RUN git lfs install --skip-repo
RUN pip3 install "peft @ git+https://github.com/huggingface/peft.git@main" \
"accelerate @ git+https://github.com/huggingface/accelerate.git@main" \
"transformers @ git+https://github.com/huggingface/transformers.git@main" && \
pip3 install awscli && \
RUN pip3 install awscli && \
# The base image ships with `pydantic==1.8.2` which is not working
pip3 install -U --no-cache-dir pydantic

View File

@@ -61,4 +61,3 @@ special_tokens:
pad_token: "<|endoftext|>"
bos_token: ">>ABSTRACT<<"
eos_token: "<|endoftext|>"

View File

@@ -61,4 +61,3 @@ special_tokens:
pad_token: "<|endoftext|>"
bos_token: ">>ABSTRACT<<"
eos_token: "<|endoftext|>"

View File

@@ -24,9 +24,9 @@ lora_fan_in_fan_out: false
wandb_project: llama-7b-lora-int4
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./llama-7b-lora-int4
batch_size: 1
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_bnb_8bit

View File

@@ -22,9 +22,9 @@ lora_fan_in_fan_out: false
wandb_project: mpt-alpaca-7b
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./mpt-alpaca-7b
batch_size: 1
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_bnb_8bit

View File

@@ -0,0 +1,10 @@
# Python 12B
- Single-GPU A100 only (?)
```shell
python scripts/finetune.py examples/pythia-12b/config.yml
```
⚠️ Multiple-GPU A100 - Doesn't seem to work with multi-gpu without causing OOM! ⚠️

View File

@@ -0,0 +1,49 @@
base_model: EleutherAI/pythia-12b-deduped
base_model_config: EleutherAI/pythia-12b-deduped
base_model_ignore_patterns: pytorch* # prefer safetensors
model_type: GPTNeoXForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
gptq: false
device_map: auto
datasets:
- path: vicgalle/alpaca-gpt4
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
adapter:
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 2048
lora_r: 64
lora_alpha: 32
lora_dropout: 0.0
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
wandb_project: pythia-12b
wandb_watch:
wandb_run_id:
wandb_log_model:
output_dir: ./pythia-12b
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 5
learning_rate: 0.00003
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
train_on_inputs: false
group_by_length: false
bf16: false
fp16: false
float16: true
tf32: true
flash_optimum: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
gradient_checkpointing: true
fsdp:
fsdp_transformer_layer_cls_to_wrap:
collator_pad_to_longest: true

View File

@@ -0,0 +1,6 @@
# qlora-openllama-3b
```shell
accelerate launch scripts/finetune.py examples/qlora-openllama-3b/config.yml
```

View File

@@ -0,0 +1,61 @@
base_model: openlm-research/open_llama_3b_600bt_preview
base_model_config: openlm-research/open_llama_3b_600bt_preview
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
push_dataset_to_hub:
datasets:
- path: teknium/GPT4-LLM-Cleaned
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
adapter: qlora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 2048
lora_r: 8
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model:
output_dir: ./qlora-out
batch_size: 4
micro_batch_size: 4
num_epochs: 2
optimizer: paged_adamw_32bit
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
eval_steps: 20
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"

View File

@@ -23,7 +23,7 @@ lora_fan_in_fan_out: false
wandb_project: redpajama-alpaca-3b
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
wandb_log_model:
output_dir: ./redpajama-alpaca-3b
batch_size: 4
micro_batch_size: 1

BIN
image/axolotl-badge-web.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

3
requirements-dev.txt Normal file
View File

@@ -0,0 +1,3 @@
pre-commit
black
mypy

View File

@@ -1,16 +1,17 @@
peft @ git+https://github.com/huggingface/peft.git
transformers @ git+https://github.com/huggingface/transformers.git
bitsandbytes>=0.39.0
accelerate
addict
fire
PyYAML==6.0
black
datasets
accelerate>=0.19.0
sentencepiece
wandb
einops
xformers
optimum
# qlora things
bert-score==0.3.13
evaluate==0.4.0

View File

@@ -1,24 +1,38 @@
"""Module to convert json file to jsonl"""
import os
import sys
from pathlib import Path
from typing import Optional, Union
import fire
from typing import Optional
from axolotl.convert import (
FileReader,
FileWriter,
JsonlSerializer,
JsonParser,
JsonToJsonlConverter,
StdoutWriter,
)
# add src to the pythonpath so we don't need to pip install this
project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
src_dir = os.path.join(project_root, "src")
sys.path.insert(0, src_dir)
from axolotl.convert import *
def main(
input: Path,
file: Path,
output: Optional[Path] = None,
to_stdout: Optional[bool] = False,
):
"""
Convert a json file to jsonl
"""
file_reader = FileReader()
writer: Union[StdoutWriter, FileWriter]
if to_stdout or output is None:
writer = StdoutWriter()
else:
@@ -28,7 +42,7 @@ def main(
converter = JsonToJsonlConverter(file_reader, writer, json_parser, jsonl_serializer)
converter.convert(input, output)
converter.convert(file, output)
if __name__ == "__main__":

View File

@@ -1,3 +1,5 @@
"""Prepare and train a model on a dataset. Can also infer from a model or merge lora"""
import importlib
import logging
import os
@@ -5,25 +7,29 @@ import random
import signal
import sys
from pathlib import Path
from typing import Optional, List, Dict, Any, Union
from typing import Any, Dict, List, Optional, Union
import fire
import torch
import yaml
# add src to the pythonpath so we don't need to pip install this
from axolotl.utils.tokenization import check_dataset_labels
from axolotl.utils.validation import validate_config
from datasets import Dataset
from optimum.bettertransformer import BetterTransformer
from transformers import GenerationConfig
from axolotl.utils.data import load_prepare_datasets, load_pretraining_dataset
from axolotl.utils.dict import DictDefault
from axolotl.utils.models import load_model, load_tokenizer
from axolotl.utils.tokenization import check_dataset_labels
from axolotl.utils.trainer import setup_trainer
from axolotl.utils.validation import validate_config
from axolotl.utils.wandb import setup_wandb_env_vars
project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
src_dir = os.path.join(project_root, "src")
sys.path.insert(0, src_dir)
from axolotl.utils.data import load_prepare_datasets
from axolotl.utils.models import load_model, load_tokenizer
from axolotl.utils.trainer import setup_trainer
from axolotl.utils.wandb import setup_wandb_env_vars
logging.basicConfig(level=os.getenv("LOG_LEVEL", "INFO"))
DEFAULT_DATASET_PREPARED_PATH = "last_run_prepared"
@@ -31,27 +37,30 @@ DEFAULT_DATASET_PREPARED_PATH = "last_run_prepared"
def choose_device(cfg):
def get_device():
if torch.cuda.is_available():
return f"cuda:{cfg.local_rank}"
else:
try:
if torch.backends.mps.is_available():
return "mps"
except:
return "cpu"
try:
if torch.cuda.is_available():
return f"cuda:{cfg.local_rank}"
if torch.backends.mps.is_available():
return "mps"
raise SystemError("No CUDA/mps device found")
except Exception: # pylint: disable=broad-exception-caught
return "cpu"
cfg.device = get_device()
if cfg.device == "cuda":
cfg.device_map = {"": cfg.local_rank}
else:
cfg.device_map = {"": cfg.device}
if cfg.device_map != "auto":
if cfg.device == "cuda":
cfg.device_map = {"": cfg.local_rank}
else:
cfg.device_map = {"": cfg.device}
def get_multi_line_input() -> Optional[str]:
print("Give me an instruction (Ctrl + D to finish): ")
instruction = ""
for line in sys.stdin:
instruction += line
instruction += line # pylint: disable=consider-using-join
# instruction = pathlib.Path("/proc/self/fd/0").read_text()
return instruction
@@ -68,31 +77,38 @@ def do_inference(cfg, model, tokenizer, prompter="AlpacaPrompter"):
instruction = get_multi_line_input()
if not instruction:
return
prompt: str = next(prompter_module().build_prompt(instruction=instruction))
prompt: str = next(
prompter_module().build_prompt(instruction=instruction.strip("\n"))
)
batch = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
model.eval()
with torch.no_grad():
# gc = GenerationConfig() # TODO swap out and use this
generated = model.generate(
inputs=batch["input_ids"].to(cfg.device),
do_sample=True,
use_cache=True,
generation_config = GenerationConfig(
repetition_penalty=1.1,
max_new_tokens=100,
max_new_tokens=1024,
temperature=0.9,
top_p=0.95,
top_k=40,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
do_sample=True,
use_cache=True,
return_dict_in_generate=True,
output_attentions=False,
output_hidden_states=False,
output_scores=False,
)
generated = model.generate(
inputs=batch["input_ids"].to(cfg.device),
generation_config=generation_config,
)
print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
def choose_config(path: Path):
yaml_files = [file for file in path.glob("*.yml")]
yaml_files = list(path.glob("*.yml"))
if not yaml_files:
raise ValueError(
@@ -130,12 +146,12 @@ def train(
config = choose_config(config)
# load the config from the yaml file
with open(config, "r") as f:
cfg: DictDefault = DictDefault(yaml.load(f, Loader=yaml.Loader))
with open(config, encoding="utf-8") as file:
cfg: DictDefault = DictDefault(yaml.safe_load(file))
# if there are any options passed in the cli, if it is something that seems valid from the yaml,
# then overwrite the value
cfg_keys = cfg.keys()
for k in kwargs:
for k, _ in kwargs.items():
# if not strict, allow writing to cfg even if it's not in the yml already
if k in cfg_keys or cfg.strict is False:
# handle booleans
@@ -144,17 +160,23 @@ def train(
else:
cfg[k] = kwargs[k]
validate_config(cfg)
# setup some derived config / hyperparams
cfg.gradient_accumulation_steps = cfg.batch_size // cfg.micro_batch_size
cfg.gradient_accumulation_steps = cfg.gradient_accumulation_steps or (
cfg.batch_size // cfg.micro_batch_size
)
cfg.batch_size = (
cfg.batch_size or cfg.micro_batch_size * cfg.gradient_accumulation_steps
)
cfg.world_size = int(os.environ.get("WORLD_SIZE", 1))
cfg.local_rank = int(os.environ.get("LOCAL_RANK", 0))
choose_device(cfg)
cfg.ddp = cfg.ddp if cfg.ddp is not None else cfg.world_size != 1
if cfg.ddp:
cfg.device_map = {"": int(os.environ.get("LOCAL_RANK", 0))}
cfg.gradient_accumulation_steps = (
cfg.gradient_accumulation_steps // cfg.world_size
)
cfg.batch_size = cfg.batch_size * cfg.world_size
setup_wandb_env_vars(cfg)
if cfg.device == "mps":
cfg.load_in_8bit = False
@@ -163,26 +185,34 @@ def train(
cfg.fp16 = True
cfg.bf16 = False
validate_config(cfg)
# load the tokenizer first
logging.info("loading tokenizer...")
tokenizer = load_tokenizer(
cfg.base_model_config,
cfg.tokenizer_type,
cfg
)
tokenizer_config = cfg.tokenizer_config or cfg.base_model_config
logging.info(f"loading tokenizer... {tokenizer_config}")
tokenizer = load_tokenizer(tokenizer_config, cfg.tokenizer_type, cfg)
if check_not_in(["inference", "shard", "merge_lora"], kwargs): # don't need to load dataset for these
train_dataset, eval_dataset = load_prepare_datasets(
tokenizer, cfg, DEFAULT_DATASET_PREPARED_PATH
)
if check_not_in(
["inference", "shard", "merge_lora"], kwargs
): # don't need to load dataset for these
if not cfg.pretraining_dataset:
train_dataset, eval_dataset = load_prepare_datasets(
tokenizer, cfg, DEFAULT_DATASET_PREPARED_PATH
)
else:
if cfg.pretraining_dataset is True:
pretraining_dataset = "togethercomputer/RedPajama-Data-1T"
else:
pretraining_dataset = cfg.pretraining_dataset
train_dataset = load_pretraining_dataset(
pretraining_dataset, tokenizer, max_tokens=cfg.sequence_len
)
train_dataset = Dataset.from_list(list(train_dataset))
eval_dataset = None
if cfg.debug or "debug" in kwargs:
logging.info("check_dataset_labels...")
check_dataset_labels(
train_dataset.select(
[random.randrange(0, len(train_dataset) - 1) for i in range(5)]
[random.randrange(0, len(train_dataset) - 1) for _ in range(5)] # nosec
),
tokenizer,
)
@@ -222,6 +252,21 @@ def train(
model.save_pretrained(cfg.output_dir)
return
if cfg.debug:
logging.info("check_dataset_labels...")
check_dataset_labels(
train_dataset.select(
[random.randrange(0, len(train_dataset) - 1) for i in range(5)] # nosec
),
tokenizer,
)
if prepare_ds_only:
logging.info("Finished preparing dataset. Exiting...")
return
model.train()
trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer)
model.config.use_cache = False
@@ -237,9 +282,15 @@ def train(
# In case we want to stop early with ctrl+c, this is a nice to have to save the pretrained model
if cfg.local_rank == 0:
def terminate_handler(_, __, model):
if cfg.flash_optimum:
model = BetterTransformer.reverse(model)
model.save_pretrained(cfg.output_dir)
sys.exit(0)
signal.signal(
signal.SIGINT,
lambda signal, frame: (model.save_pretrained(cfg.output_dir), exit(0)),
signal.SIGINT, lambda signum, frame: terminate_handler(signum, frame, model)
)
logging.info("Starting trainer...")
@@ -252,20 +303,31 @@ def train(
]
if len(possible_checkpoints) > 0:
sorted_paths = sorted(
possible_checkpoints, key=lambda path: int(path.split("-")[-1])
possible_checkpoints,
key=lambda path: int(path.split("-")[-1]),
)
resume_from_checkpoint = sorted_paths[-1]
logging.info(
f"Using Auto-resume functionality to start with checkpoint at {resume_from_checkpoint}"
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
if cfg.flash_optimum:
with torch.backends.cuda.sdp_kernel(
enable_flash=True, enable_math=True, enable_mem_efficient=True
):
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
else:
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
logging.info(f"Training Completed!!! Saving pre-trained model to {cfg.output_dir}")
# TODO do we need this fix? https://huggingface.co/docs/accelerate/usage_guides/fsdp#saving-and-loading
# only save on rank 0, otherwise it corrupts output on multi-GPU when multiple processes attempt to write the same file
if cfg.local_rank == 0:
if cfg.flash_optimum:
model = BetterTransformer.reverse(model)
model.save_pretrained(cfg.output_dir)
# trainer.save_model(cfg.output_dir) # TODO this may be needed for deepspeed to work? need to review another time

View File

@@ -1,43 +0,0 @@
#!/bin/bash
export WANDB_MODE=offline
export WANDB_CACHE_DIR=/workspace/data/wandb-cache
mkdir -p $WANDB_CACHE_DIR
mkdir -p /workspace/data/huggingface-cache/{hub,datasets}
export HF_DATASETS_CACHE="/workspace/data/huggingface-cache/datasets"
export HUGGINGFACE_HUB_CACHE="/workspace/data/huggingface-cache/hub"
export TRANSFORMERS_CACHE="/workspace/data/huggingface-cache/hub"
export NCCL_P2P_DISABLE=1
nvidia-smi
num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
gpu_indices=$(seq 0 $((num_gpus - 1)) | paste -sd "," -)
export CUDA_VISIBLE_DEVICES=$gpu_indices
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
apt-get update
apt-get install -y build-essential ninja-build vim git-lfs
git lfs install
pip3 install --force-reinstall https://download.pytorch.org/whl/nightly/cu117/torch-2.0.0.dev20230301%2Bcu117-cp38-cp38-linux_x86_64.whl --index-url https://download.pytorch.org/whl/nightly/cu117
if [ -z "${TORCH_CUDA_ARCH_LIST}" ]; then # only set this if not set yet
# this covers most common GPUs that the installed version of pytorch supports
# python -c "import torch; print(torch.cuda.get_arch_list())"
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
fi
# install flash-attn and deepspeed from pre-built wheels for this specific container b/c these take forever to install
mkdir -p /workspace/wheels
cd /workspace/wheels
curl -L -O https://github.com/OpenAccess-AI-Collective/axolotl/raw/wheels/wheels/deepspeed-0.9.2%2B7ddc3b01-cp38-cp38-linux_x86_64.whl
curl -L -O https://github.com/OpenAccess-AI-Collective/axolotl/raw/wheels/wheels/flash_attn-1.0.4-cp38-cp38-linux_x86_64.whl
pip install deepspeed-0.9.2%2B7ddc3b01-cp38-cp38-linux_x86_64.whl
pip install flash_attn-1.0.4-cp38-cp38-linux_x86_64.whl
pip install "peft @ git+https://github.com/huggingface/peft.git@main" --force-reinstall --no-dependencies
cd /workspace/
git clone https://github.com/OpenAccess-AI-Collective/axolotl.git
cd axolotl
pip install -e .[int4]
mkdir -p ~/.cache/huggingface/accelerate/
cp configs/accelerate/default_config.yaml ~/.cache/huggingface/accelerate/default_config.yaml

View File

@@ -1,7 +1,9 @@
from setuptools import setup, find_packages
"""setup.py for axolotl"""
from setuptools import find_packages, setup
install_requires = []
with open("./requirements.txt", "r") as requirements_file:
with open("./requirements.txt", encoding="utf-8") as requirements_file:
# don't include peft yet until we check the int4
# need to manually install peft for now...
reqs = [r.strip() for r in requirements_file.readlines() if "peft" not in r]
@@ -17,10 +19,10 @@ setup(
packages=find_packages(),
install_requires=install_requires,
extras_require={
"int4": [
"gptq": [
"alpaca_lora_4bit @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip",
],
"int4_triton": [
"gptq_triton": [
"alpaca_lora_4bit[triton] @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip",
],
"extras": [

View File

@@ -1,47 +1,76 @@
"""Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes"""
import json
import sys
class FileReader:
"""
Reads a file and returns its contents as a string
"""
def read(self, file_path):
with open(file_path, "r") as file:
with open(file_path, encoding="utf-8") as file:
return file.read()
class FileWriter:
"""
Writes a string to a file
"""
def __init__(self, file_path):
self.file_path = file_path
def write(self, content):
with open(self.file_path, "w") as file:
with open(self.file_path, "w", encoding="utf-8") as file:
file.write(content)
class StdoutWriter:
"""
Writes a string to stdout
"""
def write(self, content):
sys.stdout.write(content)
sys.stdout.write("\n")
class JsonParser:
"""
Parses a string as JSON and returns the result
"""
def parse(self, content):
return json.loads(content)
class JsonlSerializer:
"""
Serializes a list of JSON objects into a JSONL string
"""
def serialize(self, data):
lines = [json.dumps(item) for item in data]
return "\n".join(lines)
class JsonToJsonlConverter:
"""
Converts a JSON file to JSONL
"""
def __init__(self, file_reader, file_writer, json_parser, jsonl_serializer):
self.file_reader = file_reader
self.file_writer = file_writer
self.json_parser = json_parser
self.jsonl_serializer = jsonl_serializer
def convert(self, input_file_path, output_file_path):
def convert(
self, input_file_path, output_file_path
): # pylint: disable=unused-argument
content = self.file_reader.read(input_file_path)
data = self.json_parser.parse(content)
# data = [r for r in data if r["conversations"]] # vicuna cleaned has rows with empty conversations

View File

@@ -1,10 +1,12 @@
"""Module containing Dataset functionality"""
import logging
from typing import List
import torch
from datasets import IterableDataset
from .prompt_tokenizers import PromptTokenizingStrategy, InvalidDataException
from .prompt_tokenizers import InvalidDataException, PromptTokenizingStrategy
# We want this to be a wrapper for an existing dataset that we have loaded
# lets use the concept of middlewares to wrap each dataset, for example
@@ -14,7 +16,14 @@ from .prompt_tokenizers import PromptTokenizingStrategy, InvalidDataException
class TokenizedPromptDataset(IterableDataset):
def __init__(
"""
Iterable dataset that returns tokenized prompts from a stream of text files.
Args:
prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for proccessing the data.
dataset (dataset.Dataset): Dataset with text files.
"""
def __init__( # pylint: disable=super-init-not-called
self,
prompt_tokenizer: PromptTokenizingStrategy,
dataset: IterableDataset,
@@ -42,7 +51,7 @@ class ConstantLengthDataset(IterableDataset):
seq_length (int): Length of token sequences to return.
"""
def __init__(
def __init__( # pylint: disable=super-init-not-called
self,
tokenizer,
datasets,
@@ -82,10 +91,8 @@ class ConstantLengthDataset(IterableDataset):
else:
example_len = 0
if (
not example_len
or buffer_len + int(add_concat_token) + example_len
> self.seq_length
if not example_len or (
buffer_len + int(add_concat_token) + example_len > self.seq_length
):
if buffer["input_ids"]:
input_ids = torch.cat(buffer["input_ids"], dim=-1)[
@@ -95,9 +102,8 @@ class ConstantLengthDataset(IterableDataset):
: self.seq_length
]
labels = torch.cat(buffer["labels"], dim=-1)[: self.seq_length]
if (
labels.size() == input_ids.size()
and attention_mask.size() == input_ids.size()
if labels.size() == input_ids.size() and (
attention_mask.size() == input_ids.size()
):
yield {
"input_ids": input_ids,
@@ -108,7 +114,11 @@ class ConstantLengthDataset(IterableDataset):
logging.warning(
f"dropping batch due to tensor size mismatch input_ids: {input_ids.size()}, labels: {labels.size()}, attention_mask: {attention_mask.size()}"
)
buffer = {"input_ids": [], "attention_mask": [], "labels": []}
buffer = {
"input_ids": [],
"attention_mask": [],
"labels": [],
}
buffer_len = 0
if example:
@@ -117,6 +127,11 @@ class ConstantLengthDataset(IterableDataset):
input_ids = example["input_ids"]
attention_mask = example["attention_mask"]
labels = example["labels"]
if (
buffer["input_ids"]
and input_ids[0] == self.tokenizer.bos_token_id
):
attention_mask[0] = 0
if add_concat_token:
input_ids.append(self.concat_token_id)

View File

@@ -1,17 +1,15 @@
"""Flash attention monkey patch for llama model"""
# copied from https://github.com/lm-sys/FastChat/blob/main/fastchat/train/llama_flash_attn_monkey_patch.py
from typing import List, Optional, Tuple
from typing import Optional, Tuple
import torch
from torch import nn
import transformers
from transformers.models.llama.modeling_llama import apply_rotary_pos_emb
from einops import rearrange
from flash_attn.bert_padding import pad_input, unpad_input
from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
from flash_attn.bert_padding import unpad_input, pad_input
from transformers.models.llama.modeling_llama import apply_rotary_pos_emb
def forward(
@@ -27,6 +25,7 @@ def forward(
attention_mask: [bsz, q_len]
"""
# pylint: disable=duplicate-code
bsz, q_len, _ = hidden_states.size()
query_states = (
@@ -74,7 +73,11 @@ def forward(
qkv = rearrange(qkv, "b s ... -> (b s) ...")
max_s = q_len
cu_q_lens = torch.arange(
0, (bsz + 1) * q_len, step=q_len, dtype=torch.int32, device=qkv.device
0,
(bsz + 1) * q_len,
step=q_len,
dtype=torch.int32,
device=qkv.device,
)
output = flash_attn_unpadded_qkvpacked_func(
qkv, cu_q_lens, max_s, 0.0, softmax_scale=None, causal=True
@@ -82,35 +85,56 @@ def forward(
output = rearrange(output, "(b s) ... -> b s ...", b=bsz)
else:
nheads = qkv.shape[-2]
# pylint: disable=invalid-name
x = rearrange(qkv, "b s three h d -> b s (three h d)")
x_unpad, indices, cu_q_lens, max_s = unpad_input(x, key_padding_mask)
x_unpad = rearrange(
x_unpad, "nnz (three h d) -> nnz three h d", three=3, h=nheads
x_unpad,
"nnz (three h d) -> nnz three h d",
three=3,
h=nheads,
)
output_unpad = flash_attn_unpadded_qkvpacked_func(
x_unpad, cu_q_lens, max_s, 0.0, softmax_scale=None, causal=True
x_unpad,
cu_q_lens,
max_s,
0.0,
softmax_scale=None,
causal=True,
)
output = rearrange(
pad_input(
rearrange(output_unpad, "nnz h d -> nnz (h d)"), indices, bsz, q_len
rearrange(output_unpad, "nnz h d -> nnz (h d)"),
indices,
bsz,
q_len,
),
"b s (h d) -> b s h d",
h=nheads,
)
return self.o_proj(rearrange(output, "b s h d -> b s (h d)")), None, None
return (
self.o_proj(rearrange(output, "b s h d -> b s (h d)")),
None,
None,
)
# Disable the transformation of the attention mask in LlamaModel as the flash attention
# requires the attention mask to be the same as the key_padding_mask
def _prepare_decoder_attention_mask(
self, attention_mask, input_shape, inputs_embeds, past_key_values_length
):
self,
attention_mask,
input_shape,
inputs_embeds,
past_key_values_length,
): # pylint: disable=unused-argument
# [bsz, seq_len]
return attention_mask
def replace_llama_attn_with_flash_attn():
transformers.models.llama.modeling_llama.LlamaModel._prepare_decoder_attention_mask = (
transformers.models.llama.modeling_llama.LlamaModel._prepare_decoder_attention_mask = ( # pylint: disable=protected-access
_prepare_decoder_attention_mask
)
transformers.models.llama.modeling_llama.LlamaAttention.forward = forward

View File

@@ -0,0 +1,233 @@
"""
Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments
"""
import logging
import math
from typing import Optional, Tuple
import torch
import transformers.models.llama.modeling_llama
from torch import nn
try:
import xformers.ops
except ImportError:
logging.error("xformers not found! Please install it before trying to use it.")
def hijack_llama_attention():
transformers.models.llama.modeling_llama.LlamaAttention.forward = xformers_forward
def hijack_llama_sdp_attention():
transformers.models.llama.modeling_llama.LlamaAttention.forward = (
sdp_attention_forward
)
def xformers_forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: bool = False,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
# pylint: disable=duplicate-code
bsz, q_len, _ = hidden_states.size()
query_states = (
self.q_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
key_states = (
self.k_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
value_states = (
self.v_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
kv_seq_len = key_states.shape[-2]
if past_key_value is not None:
kv_seq_len += past_key_value[0].shape[-2]
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
(
query_states,
key_states,
) = transformers.models.llama.modeling_llama.apply_rotary_pos_emb(
query_states, key_states, cos, sin, position_ids
)
# [bsz, nh, t, hd]
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
# We only apply xformers optimizations if we don't need to output the whole attention matrix
if not output_attentions:
query_states = query_states.transpose(1, 2)
key_states = key_states.transpose(1, 2)
value_states = value_states.transpose(1, 2)
# This is a nasty hack. We know attention_mask in transformers is either LowerTriangular or all Zeros.
# We therefore check if one element in the upper triangular portion is zero. If it is, then the mask is all zeros.
if attention_mask is None or attention_mask[0, 0, 0, 1] == 0:
# input and output should be of form (bsz, q_len, num_heads, head_dim)
attn_output = xformers.ops.memory_efficient_attention(
query_states, key_states, value_states, attn_bias=None
)
else:
# input and output should be of form (bsz, q_len, num_heads, head_dim)
attn_output = xformers.ops.memory_efficient_attention(
query_states,
key_states,
value_states,
attn_bias=xformers.ops.LowerTriangularMask(),
)
attn_weights = None
else:
attn_weights = torch.matmul(
query_states, key_states.transpose(2, 3)
) / math.sqrt(self.head_dim)
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
raise ValueError(
f"Attention weights should be of size {(bsz * self.num_heads, q_len, kv_seq_len)}, but is"
f" {attn_weights.size()}"
)
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
raise ValueError(
f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
)
attn_weights = attn_weights + attention_mask
attn_weights = torch.max(
attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)
)
# upcast attention to fp32
attn_weights = nn.functional.softmax(
attn_weights, dim=-1, dtype=torch.float32
).to(query_states.dtype)
attn_output = torch.matmul(attn_weights, value_states)
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
raise ValueError(
f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
f" {attn_output.size()}"
)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
return attn_output, attn_weights, past_key_value
def sdp_attention_forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: bool = False,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
# pylint: disable=duplicate-code
bsz, q_len, _ = hidden_states.size()
query_states = (
self.q_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
key_states = (
self.k_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
value_states = (
self.v_proj(hidden_states)
.view(bsz, q_len, self.num_heads, self.head_dim)
.transpose(1, 2)
)
kv_seq_len = key_states.shape[-2]
if past_key_value is not None:
kv_seq_len += past_key_value[0].shape[-2]
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
(
query_states,
key_states,
) = transformers.models.llama.modeling_llama.apply_rotary_pos_emb(
query_states, key_states, cos, sin, position_ids
)
# [bsz, nh, t, hd]
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
# We only apply sdp attention if we don't need to output the whole attention matrix
if not output_attentions:
attn_output = torch.nn.functional.scaled_dot_product_attention(
query_states,
key_states,
value_states,
attn_mask=attention_mask,
is_causal=False,
)
attn_weights = None
else:
attn_weights = torch.matmul(
query_states, key_states.transpose(2, 3)
) / math.sqrt(self.head_dim)
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
raise ValueError(
f"Attention weights should be of size {(bsz * self.num_heads, q_len, kv_seq_len)}, but is"
f" {attn_weights.size()}"
)
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
raise ValueError(
f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
)
attn_weights = attn_weights + attention_mask
attn_weights = torch.max(
attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min)
)
# upcast attention to fp32
attn_weights = nn.functional.softmax(
attn_weights, dim=-1, dtype=torch.float32
).to(query_states.dtype)
attn_output = torch.matmul(attn_weights, value_states)
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
raise ValueError(
f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
f" {attn_output.size()}"
)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
return attn_output, attn_weights, past_key_value

View File

@@ -1,3 +1,5 @@
"""Module to load prompt strategies."""
import importlib
@@ -7,8 +9,8 @@ def load(strategy, tokenizer, cfg):
if strategy.split(".")[-1].startswith("load_"):
load_fn = strategy.split(".")[-1]
strategy = ".".join(strategy.split(".")[:-1])
m = importlib.import_module(f".{strategy}", "axolotl.prompt_strategies")
fn = getattr(m, load_fn)
return fn(tokenizer, cfg)
except:
pass
mod = importlib.import_module(f".{strategy}", "axolotl.prompt_strategies")
func = getattr(mod, load_fn)
return func(tokenizer, cfg)
except Exception: # pylint: disable=broad-exception-caught
return None

View File

@@ -1,3 +1,7 @@
"""Module containing the AlpacaQAPromptTokenizingStrategy class"""
from typing import Tuple
from axolotl.prompt_tokenizers import (
AlpacaPromptTokenizingStrategy,
InstructionPromptTokenizingStrategy,
@@ -7,7 +11,7 @@ from axolotl.prompters import AlpacaPrompter, PromptStyle
def load(tokenizer, cfg):
return AlpacaPromptTokenizingStrategy(
AlpacaPrompter(PromptStyle.chat.value),
AlpacaPrompter(PromptStyle.CHAT.value),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,
@@ -15,7 +19,11 @@ def load(tokenizer, cfg):
class AlpacaQAPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for AlpacaQA
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["question"],
"",
@@ -25,7 +33,7 @@ class AlpacaQAPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def load_qa(tokenizer, cfg):
return AlpacaQAPromptTokenizingStrategy(
AlpacaPrompter(PromptStyle.chat.value),
AlpacaPrompter(PromptStyle.CHAT.value),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,

View File

@@ -1,10 +1,12 @@
"""Module loading the AlpacaInstructPromptTokenizingStrategy class"""
from axolotl.prompt_tokenizers import AlpacaPromptTokenizingStrategy
from axolotl.prompters import AlpacaPrompter, PromptStyle
def load(tokenizer, cfg):
return AlpacaPromptTokenizingStrategy(
AlpacaPrompter(PromptStyle.instruct),
AlpacaPrompter(PromptStyle.INSTRUCT.value),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,

View File

@@ -1,11 +1,18 @@
from typing import Union, Generator
"""Module loading the CreativePromptTokenizingStrategy and similar classes"""
from typing import Generator, Tuple, Union
import yaml
from axolotl.prompt_tokenizers import InstructionPromptTokenizingStrategy
class CreativeAnsweringPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for Creative Answering
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
question = prompt["instruction"]
answer = prompt[
"revision"
@@ -18,6 +25,10 @@ class CreativeAnsweringPromptTokenizingStrategy(InstructionPromptTokenizingStrat
class CreativeCritiquePromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
"""
Tokenizing strategy for Creative Critique
"""
user_prompt = """Given the following Question and Response, critique the Response on a scale of 1-10. You should critique the answer in the following criteria:
refusal: whether the Response refuses to answer the Question. Responses containing language similar to "As an AI language model, I am not capable of ..." is a form of refusal and should have a low refusal score. 10 is no refusal, 0 is refuses to answer the question.
prescriptive bias: whether the Response attempts to prescribe or dictate certain actions or behaviors of the user. 10 no prescriptive bias present, 0 means there is prescriptive bias.
@@ -49,12 +60,16 @@ Question: {question}
Answer: {answer}
"""
def parse_instruction_fields(self, prompt) -> (str, str, str):
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
scores = yaml.dump(
prompt["scores"], default_flow_style=False, Dumper=yaml.Dumper
prompt["scores"],
default_flow_style=False,
Dumper=yaml.Dumper,
)
critiques = yaml.dump(
prompt["critiques"], default_flow_style=False, Dumper=yaml.Dumper
prompt["critiques"],
default_flow_style=False,
Dumper=yaml.Dumper,
)
evaluation = scores + critiques
question = prompt["instruction"]
@@ -67,6 +82,10 @@ Answer: {answer}
class CreativeRevisePromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
"""
Tokenizing strategy for Creative Revise
"""
user_prompt = """Definitions:
refusal: whether the Response refuses to answer the Question. Responses containing language similar to "As an AI language model, I am not capable of ..." is a form of refusal and should have a low refusal score. 10 is no refusal, 0 is refuses to answer the question.
prescriptive bias: whether the Response attempts to prescribe or dictate certain actions or behaviors of the user. 10 no prescriptive bias present, 0 means their is prescriptive bias.
@@ -81,12 +100,16 @@ Evaluation:
{evaluation}
"""
def parse_instruction_fields(self, prompt) -> (str, str, str):
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
scores = yaml.dump(
prompt["scores"], default_flow_style=False, Dumper=yaml.Dumper
prompt["scores"],
default_flow_style=False,
Dumper=yaml.Dumper,
)
critiques = yaml.dump(
prompt["critiques"], default_flow_style=False, Dumper=yaml.Dumper
prompt["critiques"],
default_flow_style=False,
Dumper=yaml.Dumper,
)
evaluation = scores + critiques
question = prompt["instruction"]
@@ -101,13 +124,19 @@ Evaluation:
class CreativePrompterBase:
"""
Base class for Creative Prompters
"""
system_prompt = ""
prompt_input = "{system_prompt}\nUSER: {instruction}\nASSISTANT:"
def build_prompt(
self,
instruction: str,
input: Union[None, str] = None,
input: Union[ # pylint: disable=redefined-builtin, unused-argument
None, str
] = None,
output: Union[None, str] = None,
) -> Generator[str, None, None]:
if self.system_prompt:
@@ -120,30 +149,51 @@ class CreativePrompterBase:
class CreativeAnswerPrompter(CreativePrompterBase):
"""
Prompter for Creative Answering
"""
system_prompt = "Answer the following question in a comprehensive, in-depth, and creative way. Additionally your response should be relevant, accurate, and free of any ambiguity."
class CreativeCritiquePrompter(CreativePrompterBase):
"""
Prompter for Creative Critique
"""
system_prompt = ""
class CreativeRevisePrompter(CreativePrompterBase):
"""
Prompter for Creative Revise
"""
system_prompt = ""
def load_answer(tokenizer, cfg):
return CreativeAnsweringPromptTokenizingStrategy(
CreativeAnswerPrompter(), tokenizer, cfg.train_on_inputs, cfg.sequence_len
CreativeAnswerPrompter(),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,
)
def load_critique(tokenizer, cfg):
return CreativeCritiquePromptTokenizingStrategy(
CreativeCritiquePrompter(), tokenizer, cfg.train_on_inputs, cfg.sequence_len
CreativeCritiquePrompter(),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,
)
def load_revise(tokenizer, cfg):
return CreativeRevisePromptTokenizingStrategy(
CreativeRevisePrompter(), tokenizer, cfg.train_on_inputs, cfg.sequence_len
CreativeRevisePrompter(),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,
)

View File

@@ -1,29 +1,34 @@
"""Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class"""
import copy
import logging
from collections import defaultdict
from typing import Generator
from typing import Generator, List, Tuple
from axolotl.prompt_tokenizers import PromptTokenizingStrategy
from axolotl.prompt_tokenizers import (
PromptTokenizingStrategy,
parse_tokenized_to_result,
tokenize_prompt_default,
)
IGNORE_TOKEN_ID = -100
class PygmalionPromptTokenizingStrategy(PromptTokenizingStrategy):
bot_prefix_token_ids = []
"""
Tokenizing strategy for Pygmalion.
"""
bot_prefix_token_ids: List[int] = []
def __init__(self, prompter, tokenizer, *args, **kwargs):
super().__init__(prompter, tokenizer)
super().__init__(prompter, tokenizer, *args, **kwargs)
res = self._tokenize("<|model|>", add_eos_token=False, strip_bos_token=True)
self.bot_prefix_token_ids = res["input_ids"]
def tokenize_prompt(self, prompt):
result = {
"input_ids": [],
"attention_mask": [],
"labels": [],
}
current_len = 0
for i, part in enumerate(self.prompter.build_prompt(prompt["conversations"])):
result, current_len = tokenize_prompt_default()
for _, part in enumerate(self.prompter.build_prompt(prompt["conversations"])):
role, message = part
if role == "system":
prefix = "<|system|>"
@@ -61,45 +66,29 @@ class PygmalionPromptTokenizingStrategy(PromptTokenizingStrategy):
else:
logging.warning(f"unknown role in conversation: {role}")
res = defaultdict(lambda: [])
input_ids = res["input_ids"]
input_len = len(input_ids)
result["input_ids"][current_len : current_len + input_len] = input_ids
result["attention_mask"][current_len : current_len + input_len] = [
1 if x != self.tokenizer.pad_token_id else 0 for x in input_ids
]
result["labels"][current_len : current_len + input_len] = labels
current_len += input_len
return result
def _tokenize(self, prompt, add_eos_token=True, strip_bos_token=False):
result = self.tokenizer(
prompt,
truncation=True,
max_length=self.sequence_len,
padding=False,
return_tensors=None,
)
if (
result["input_ids"][-1] != self.tokenizer.eos_token_id
and len(result["input_ids"]) < self.sequence_len
and add_eos_token
):
result["input_ids"].append(self.tokenizer.eos_token_id)
result["attention_mask"].append(1)
if result["input_ids"][0] == self.tokenizer.bos_token_id and strip_bos_token:
result["input_ids"] = result["input_ids"][1:]
result["attention_mask"] = result["attention_mask"][1:]
result["labels"] = result["input_ids"].copy()
# pylint: disable=duplicate-code
result, current_len = parse_tokenized_to_result(
result,
current_len,
res,
labels,
pad_token_id=self.tokenizer.pad_token_id,
)
return result
class PygmalionPrompter:
"""
Prompter for Pygmalion.
"""
def __init__(self, *args, **kwargs):
pass
def build_prompt(self, source, *args, **kwargs) -> Generator[str, None, None]:
def build_prompt(
self, source, *args, **kwargs # pylint: disable=unused-argument
) -> Generator[Tuple[str, str], None, None]:
for msg in source:
yield msg["role"], msg["value"]

View File

@@ -1,24 +1,33 @@
"""Module containing PromptTokenizingStrategy and Prompter classes"""
import abc
import copy
import functools
import logging
from typing import Dict, List, Tuple, Union
from transformers import PreTrainedTokenizer
from axolotl.prompters import IGNORE_TOKEN_ID
IGNORE_INDEX = -100
LLAMA_DEFAULT_PAD_TOKEN = "[PAD]"
LLAMA_DEFAULT_EOS_TOKEN = "</s>"
LLAMA_DEFAULT_BOS_TOKEN = "<s>"
LLAMA_DEFAULT_UNK_TOKEN = "<unk>"
LLAMA_DEFAULT_PAD_TOKEN = "[PAD]" # nosec
LLAMA_DEFAULT_EOS_TOKEN = "</s>" # nosec
LLAMA_DEFAULT_BOS_TOKEN = "<s>" # nosec
LLAMA_DEFAULT_UNK_TOKEN = "<unk>" # nosec
class InvalidDataException(Exception):
pass
"""
Exception raised when the data is invalid
"""
class PromptTokenizingStrategy(abc.ABC):
"""
Abstract class for tokenizing strategies
"""
def __init__(
self,
prompter,
@@ -35,59 +44,21 @@ class PromptTokenizingStrategy(abc.ABC):
def tokenize_prompt(self, prompt):
pass
@functools.cache
@functools.lru_cache(maxsize=128)
def _get_user_token(self):
id_or_ids = self.tokenizer.convert_tokens_to_ids("<|USER|>")
if isinstance(id_or_ids, (int,)):
return id_or_ids
return False
@functools.cache
@functools.lru_cache(maxsize=128)
def _get_assistant_token(self):
id_or_ids = self.tokenizer.convert_tokens_to_ids("<|ASSISTANT|>")
if isinstance(id_or_ids, (int,)):
return id_or_ids
return False
class InstructionPromptTokenizingStrategy(PromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
raise NotImplementedError
def tokenize_prompt(self, prompt):
instruction, input, response = self.parse_instruction_fields(prompt)
full_prompt = self._build_full_prompt(instruction, input, response)
tokenized_full_prompt = self._tokenize(full_prompt)
if not self.train_on_inputs:
user_prompt = next(
iter(
self.prompter.build_prompt(
instruction,
input,
)
)
)
tokenized_user_prompt = self._tokenize(user_prompt, add_eos_token=False)
user_prompt_len = len(tokenized_user_prompt["input_ids"])
# TODO this could be sped up using numpy array slicing
tokenized_full_prompt["labels"] = [
-100
] * user_prompt_len + tokenized_full_prompt["labels"][user_prompt_len:]
return tokenized_full_prompt
def _build_full_prompt(self, instruction, input, response):
return next(
iter(
self.prompter.build_prompt(
instruction,
input,
response,
)
)
)
def _tokenize(self, prompt, add_eos_token=True, strip_bos_token=False):
def _tokenize(self, prompt: str, add_eos_token=True, strip_bos_token=False):
result = self.tokenizer(
prompt,
truncation=True,
@@ -111,8 +82,60 @@ class InstructionPromptTokenizingStrategy(PromptTokenizingStrategy):
return result
class InstructionPromptTokenizingStrategy(PromptTokenizingStrategy):
"""
Tokenizing strategy for instruction-based prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
raise NotImplementedError
def tokenize_prompt(self, prompt):
(
instruction,
input, # pylint: disable=redefined-builtin
response,
) = self.parse_instruction_fields(prompt)
full_prompt = self._build_full_prompt(instruction, input, response)
tokenized_full_prompt = self._tokenize(full_prompt)
if not self.train_on_inputs:
user_prompt = next(
iter(
self.prompter.build_prompt(
instruction,
input,
)
)
)
tokenized_user_prompt = self._tokenize(user_prompt, add_eos_token=False)
user_prompt_len = len(tokenized_user_prompt["input_ids"])
# TODO this could be sped up using numpy array slicing
tokenized_full_prompt["labels"] = [
-100
] * user_prompt_len + tokenized_full_prompt["labels"][user_prompt_len:]
return tokenized_full_prompt
def _build_full_prompt(
self, instruction, input, response # pylint: disable=redefined-builtin
):
return next(
iter(
self.prompter.build_prompt(
instruction,
input,
response,
)
)
)
class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for Alpaca prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["instruction"],
prompt["input"] if "input" in prompt else "",
@@ -121,7 +144,11 @@ class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
class AlpacaMultipleChoicePromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for Alpaca Multiple Choice prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["question"],
"\n".join(f'- "{choice}"' for choice in prompt["choices"]),
@@ -130,7 +157,11 @@ class AlpacaMultipleChoicePromptTokenizingStrategy(InstructionPromptTokenizingSt
class JeopardyPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for Jeopardy prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["question"],
prompt["category"],
@@ -139,7 +170,11 @@ class JeopardyPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
class OpenAssistantPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for OpenAssistant prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["INSTRUCTION"],
"",
@@ -148,7 +183,11 @@ class OpenAssistantPromptTokenizingStrategy(InstructionPromptTokenizingStrategy)
class SummarizeTLDRPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for SummarizeTLDR prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["article"],
"",
@@ -157,7 +196,11 @@ class SummarizeTLDRPromptTokenizingStrategy(InstructionPromptTokenizingStrategy)
class GPTeacherPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for GPTeacher prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["instruction"],
prompt["input"] if "input" in prompt else "",
@@ -166,7 +209,11 @@ class GPTeacherPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
class NomicGPT4AllPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str):
"""
Tokenizing strategy for NomicGPT4All prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
return (
prompt["prompt"],
"",
@@ -175,28 +222,34 @@ class NomicGPT4AllPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
class CompletionPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> str:
return prompt["text"]
"""
Tokenizing strategy for Completion prompts.
"""
def tokenize_prompt(self, prompt):
instruction = self.parse_instruction_fields(prompt)
full_prompt = self._build_full_prompt(instruction, None, None)
full_prompt = self._build_full_prompt(prompt["text"], None, None)
tokenized_full_prompt = self._tokenize(full_prompt)
return tokenized_full_prompt
def _build_full_prompt(self, instruction, input, response):
return next(iter(self.prompter.build_prompt(instruction)))
def _build_full_prompt(
self, instruction, input, response
): # pylint: disable=redefined-builtin
return next(iter(self.prompter.build_prompt(instruction, input, response)))
class ReflectionPromptTokenizingStrategy(PromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str, str, str):
"""
Tokenizing strategy for Reflection prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str, str]:
raise NotImplementedError
def tokenize_prompt(self, prompt):
(
instruction,
input,
input, # pylint: disable=redefined-builtin
output,
reflection,
corrected,
@@ -223,7 +276,9 @@ class ReflectionPromptTokenizingStrategy(PromptTokenizingStrategy):
return tokenized_full_prompt
def _build_full_prompt(self, instruction, input, output, reflection, corrected):
def _build_full_prompt(
self, instruction, input, output, reflection, corrected
): # pylint: disable=redefined-builtin
return next(
iter(
self.prompter.build_prompt(
@@ -236,7 +291,7 @@ class ReflectionPromptTokenizingStrategy(PromptTokenizingStrategy):
)
)
def _tokenize(self, prompt, add_eos_token=True):
def _tokenize(self, prompt, add_eos_token=True, strip_bos_token=False):
result = self.tokenizer(
prompt,
truncation=True,
@@ -257,7 +312,11 @@ class ReflectionPromptTokenizingStrategy(PromptTokenizingStrategy):
class AlpacaReflectionPTStrategy(ReflectionPromptTokenizingStrategy):
def parse_instruction_fields(self, prompt) -> (str, str, str, str, str):
"""
Tokenizing strategy for Alpaca Reflection prompts.
"""
def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str, str]:
return (
prompt["instruction"],
prompt["input"] if "input" in prompt else "",
@@ -268,20 +327,19 @@ class AlpacaReflectionPTStrategy(ReflectionPromptTokenizingStrategy):
class ShareGPTPromptTokenizingStrategy(PromptTokenizingStrategy):
"""
Tokenizing strategy for ShareGPT prompts.
"""
def get_conversation_thread(self, prompt):
return prompt["conversations"]
def tokenize_prompt(self, prompt):
result = {
"input_ids": [],
"attention_mask": [],
"labels": [],
}
current_len = 0
result, current_len = tokenize_prompt_default()
user_token = self._get_user_token()
assistant_token = self._get_assistant_token()
try:
for i, part in enumerate(
for _, part in enumerate(
self.prompter.build_prompt(self.get_conversation_thread(prompt))
):
if isinstance(part, tuple):
@@ -289,7 +347,9 @@ class ShareGPTPromptTokenizingStrategy(PromptTokenizingStrategy):
part = part[0] + part[1] if not user_token else part[1]
# this is still the user query, we should
res = self._tokenize(
part.strip(), add_eos_token=False, strip_bos_token=True
part.strip(),
add_eos_token=False,
strip_bos_token=True,
)
if user_token:
res["input_ids"] = [user_token, *res["input_ids"]]
@@ -300,32 +360,39 @@ class ShareGPTPromptTokenizingStrategy(PromptTokenizingStrategy):
part = part[0] + part[1] if not assistant_token else part[1]
# this should be the assistent response, should end with an eos token
res = self._tokenize(
part.strip(), add_eos_token=True, strip_bos_token=True
part.strip(),
add_eos_token=True,
strip_bos_token=True,
)
if assistant_token:
res["input_ids"] = [assistant_token, *res["input_ids"]]
res["input_ids"] = [
assistant_token,
*res["input_ids"],
]
# not masked out from labels
labels = copy.deepcopy(res["input_ids"])
elif part[0] == "SYSTEM:":
part = part[1] # Ignore the system role from preamble
# this is only ever the first part, should include the bos token and the user query
res = self._tokenize(
part.strip(), add_eos_token=False, strip_bos_token=False
)
# everything from this is masked out from the labels
labels = [IGNORE_TOKEN_ID] * len(res["input_ids"])
else:
logging.warning("unhandled role: " + part[0])
else:
# this is only ever the first part, should include the bos token and the user query
res = self._tokenize(
part.strip(), add_eos_token=False, strip_bos_token=False
)
# everything from this is masked out from the labels
labels = [IGNORE_TOKEN_ID] * len(res["input_ids"])
input_ids = res["input_ids"]
input_len = len(input_ids)
result["input_ids"][current_len : current_len + input_len] = input_ids
result["attention_mask"][current_len : current_len + input_len] = [
1 if x != self.tokenizer.pad_token_id else 0 for x in input_ids
]
result["labels"][current_len : current_len + input_len] = labels
current_len += input_len
logging.warning(f"unhandled role: {part[0]}")
# pylint: disable=duplicate-code
result, current_len = parse_tokenized_to_result(
result,
current_len,
res,
labels,
pad_token_id=self.tokenizer.pad_token_id,
)
return result
except (KeyError, AssertionError, IndexError) as e:
raise InvalidDataException(str(e))
except (KeyError, AssertionError, IndexError) as err:
raise InvalidDataException(str(err)) from err
def _tokenize(self, prompt, add_eos_token=True, strip_bos_token=False):
result = self.tokenizer(
@@ -349,3 +416,40 @@ class ShareGPTPromptTokenizingStrategy(PromptTokenizingStrategy):
result["labels"] = result["input_ids"].copy()
return result
def tokenize_prompt_default() -> Tuple[Dict[str, List[int]], int]:
"""
Returns the default values for the tokenize prompt function
"""
result: Dict[str, List[int]] = {
"input_ids": [],
"attention_mask": [],
"labels": [],
}
current_len = 0
return result, current_len
def parse_tokenized_to_result(
result: Dict[str, List[int]],
current_len: int,
res: Dict[str, List[int]],
labels: list[int],
pad_token_id: Union[int, None] = None,
) -> Tuple[Dict[str, List[int]], int]:
"""
Parses the tokenized prompt and append the tokenized input_ids, attention_mask and labels to the result
"""
input_ids = res["input_ids"]
input_len = len(input_ids)
result["input_ids"][current_len : current_len + input_len] = input_ids
result["attention_mask"][current_len : current_len + input_len] = [
1 if x != pad_token_id else 0 for x in input_ids
]
result["labels"][current_len : current_len + input_len] = labels
current_len += input_len
return result, current_len

View File

@@ -1,28 +1,37 @@
import copy
"""Module containing prompters"""
import dataclasses
import logging
from enum import auto, Enum
from typing import List, Tuple, Any, Union, Generator
from enum import Enum, auto
from typing import Generator, List, Optional, Tuple, Union
IGNORE_TOKEN_ID = -100
class PromptStyle(Enum):
instruct = "instruct"
chat = "chat"
"""
Enum for prompt styles
"""
INSTRUCT = "instruct"
CHAT = "chat"
class AlpacaPrompter:
"""
Base class for alpaca prompters
"""
system_prompt = "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n"
system_no_input_prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n"
prompt_style = None
prompt_style: Optional[PromptStyle] = None
def __init__(self, prompt_style=PromptStyle.instruct.value):
self.prompt_style = prompt_style if prompt_style else PromptStyle.instruct.value
def __init__(self, prompt_style=PromptStyle.INSTRUCT.value):
self.prompt_style = prompt_style if prompt_style else PromptStyle.INSTRUCT.value
self.match_prompt_style()
def match_prompt_style(self):
if self.prompt_style == PromptStyle.instruct.value:
if self.prompt_style == PromptStyle.INSTRUCT.value:
self.prompt_input = (
self.system_prompt
+ "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
@@ -32,7 +41,7 @@ class AlpacaPrompter:
+ "### Instruction:\n{instruction}\n\n### Response:\n"
)
self.response_split = "### Response:"
if self.prompt_style == PromptStyle.chat.value:
if self.prompt_style == PromptStyle.CHAT.value:
self.prompt_input = (
self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
)
@@ -44,7 +53,7 @@ class AlpacaPrompter:
def build_prompt(
self,
instruction: str,
input: Union[None, str] = None,
input: Union[None, str] = None, # pylint: disable=redefined-builtin
output: Union[None, str] = None,
) -> Generator[str, None, None]:
# returns the full prompt from instruction and optional input
@@ -62,33 +71,60 @@ class AlpacaPrompter:
class UnpromptedPrompter(AlpacaPrompter):
"""
Prompter for alpaca no system prompt
"""
system_prompt = ""
system_no_input_prompt = ""
class JeopardyPrompter(AlpacaPrompter):
"""
Prompter for Jeopardy
"""
prompt_input = "Below is a Jeopardy clue paired with input providing the category of the clue. Write a concise response that best answers tbe clue given the category.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
class MultipleChoiceExplainPrompter(AlpacaPrompter):
"""
Prompter for multiple choice explain
"""
system_prompt = (
"Choose the answer that best answers the question. Explain your reasoning."
)
class MultipleChoiceConcisePrompter(AlpacaPrompter):
"""
Prompter for multiple choice concise
"""
prompt_input = "Choose the answer that best answers the question. Be concise in your response.\n\nUSER: {instruction}\n{input}\nASSISTANT:\n"
class SummarizeTLDRPrompter(AlpacaPrompter):
"""
Prompter for summarize TLDR
"""
prompt_no_input = (
"USER: Summarize the following article as a TL;DR.\n{instruction}\nASSISTANT:"
)
class CompletionPrompter:
"""
Prompter for completion
"""
def build_prompt(
self, instruction: str, input=None, output=None
self,
instruction: str,
input=None, # pylint: disable=redefined-builtin, unused-argument
output=None, # pylint: disable=unused-argument
) -> Generator[str, None, None]:
yield instruction
@@ -97,14 +133,22 @@ class CompletionPrompter:
class GPTeacherPrompter(AlpacaPrompter):
...
"""
Prompter for GPTeacher
"""
class NomicGPT4AllPrompter(AlpacaPrompter):
...
"""
Prompter for NomicGPT4All
"""
class ReflectAlpacaPrompter:
"""
Prompter for ReflectAlpaca
"""
system_prompt = "Below is an instruction that describes a task, paired with an input that provides further context. You, the Assistant, should generate a response as if it were an abstract for an academic or technical paper on the query along with a methodology. Then generate an Agent Reflection where you create a long form response as if from subject matter expert, be verbose, diligent, and creative in your application of knowledge, apply it through the lens of the response generated by the assistant. Look for flawed reasoning, faulty logic, or other mistakes in the method. Finally, generate a final response and method for the user with the Assistant abstract and Reflection analysis as augmentations to the generation\n\n"
system_no_input_prompt = "Below is an instruction that describes a task. You, the Assistant, should generate a response as if it were an abstract for an academic or technical paper on the query along with a methodology. Then generate an Agent Reflection where you create a long form response as if from subject matter expert, be verbose, diligent, and creative in your application of knowledge, apply it through the lens of the response generated by the assistant. Look for flawed reasoning, faulty logic, or other mistakes in the method. Finally, generate a final response and method for the user with the Assistant abstract and Reflection analysis as augmentations to the generation\n\n"
@@ -120,7 +164,7 @@ class ReflectAlpacaPrompter:
self.match_prompt_style()
def match_prompt_style(self):
if self.prompt_style == PromptStyle.instruct.value:
if self.prompt_style == PromptStyle.INSTRUCT.value:
self.prompt_input = (
self.system_prompt
+ "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
@@ -131,7 +175,7 @@ class ReflectAlpacaPrompter:
)
self.agent_label = "### Thought:\n{output}\n\n### Agent Reflection:\n{reflection}\n\n### Final Response:\n{corrected}"
self.response_split = "### Final Response:"
if self.prompt_style == PromptStyle.chat.value:
if self.prompt_style == PromptStyle.CHAT.value:
self.prompt_input = (
self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
)
@@ -146,7 +190,7 @@ class ReflectAlpacaPrompter:
def build_prompt(
self,
instruction: str,
input: Union[None, str] = None,
input: Union[None, str] = None, # pylint: disable=redefined-builtin
output: Union[None, str] = None,
reflection: Union[None, str] = None,
corrected: Union[None, str] = None,
@@ -159,7 +203,9 @@ class ReflectAlpacaPrompter:
res = self.prompt_no_input.format(instruction=instruction)
if output and reflection and corrected:
label = self.agent_label.format(
output=output, reflection=reflection, corrected=corrected
output=output,
reflection=reflection,
corrected=corrected,
)
res = f"{res}{label}"
yield res
@@ -187,18 +233,18 @@ class Conversation:
offset: int
sep_style: SeparatorStyle = SeparatorStyle.SINGLE
sep: str = "###"
sep2: str = None
sep2: Optional[str] = None
def get_prompt(self) -> Generator[str, None, None]:
seps = [self.sep, self.sep2]
preamble = self.system + seps[0]
yield preamble
for i, (role, message) in enumerate(self.messages):
def get_prompt(self) -> Generator[Tuple[str, str], None, None]:
# seps = [self.sep, self.sep2]
preamble = self.system + self.sep
yield ("SYSTEM:", preamble)
for _, (role, message) in enumerate(self.messages):
if message:
yield (role + ":", " " + message)
else:
logging.warning("role with empty message: " + role)
yield (role + ":",)
logging.warning(f"role with empty message: {role}")
yield (role + ":", "")
def copy(self):
return Conversation(
@@ -227,10 +273,14 @@ conv_vicuna_v1_1 = Conversation(
)
class ShareGPTPrompter:
class ShareGPTPrompter: # pylint: disable=too-few-public-methods
"""
A prompter that generates prompts for the ShareGPT
"""
def __init__(self, prompt_style=None):
if prompt_style != PromptStyle.chat.value:
raise Exception(
if prompt_style != PromptStyle.CHAT.value:
raise ValueError(
f"unsupported prompt_style for ShareGPTPrompter({prompt_style})"
)
@@ -240,7 +290,7 @@ class ShareGPTPrompter:
# self.prompt_no_input = self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:"
# self.response_split = "ASSISTANT:"
def build_prompt(self, source, *args, **kwargs) -> Generator[str, None, None]:
def build_prompt(self, source) -> Generator[str, None, None]:
# ignore the system prompt if provided
if source[0]["from"] == "system":
source.pop(0)
@@ -261,9 +311,9 @@ class ShareGPTPrompter:
):
# Skip the first one if it is not from human
source = source[1:]
except IndexError as e:
except IndexError as err:
# sometimes there is a bing or system chat
raise e
raise err
conv.messages = []
for j, sentence in enumerate(source):

View File

@@ -1,16 +1,20 @@
"""Callbacks for Trainer class"""
import os
from optimum.bettertransformer import BetterTransformer
from transformers import (
Seq2SeqTrainer,
TrainerCallback,
TrainingArguments,
TrainerState,
TrainerControl,
TrainerState,
TrainingArguments,
)
from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR, IntervalStrategy
class SavePeftModelCallback(TrainerCallback):
class SavePeftModelCallback(TrainerCallback): # pylint: disable=too-few-public-methods
"""Callback to save the PEFT adapter"""
def on_save(
self,
args: TrainingArguments,
@@ -19,10 +23,47 @@ class SavePeftModelCallback(TrainerCallback):
**kwargs,
):
checkpoint_folder = os.path.join(
args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}"
args.output_dir,
f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}",
)
peft_model_path = os.path.join(checkpoint_folder, "adapter_model")
kwargs["model"].save_pretrained(peft_model_path)
return control
class SaveBetterTransformerModelCallback(
TrainerCallback
): # pylint: disable=too-few-public-methods
"""Callback to save the BetterTransformer wrapped model"""
def on_step_end(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs,
):
# Save
if (
args.save_strategy == IntervalStrategy.STEPS
and args.save_steps > 0
and state.global_step % args.save_steps == 0
):
control.should_save = True
if control.should_save:
checkpoint_folder = os.path.join(
args.output_dir,
f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}",
)
model = BetterTransformer.reverse(kwargs["model"])
model.save_pretrained(checkpoint_folder)
# FIXME - need to cleanup old checkpoints
# since we're saving here, we don't need the trainer loop to attempt to save too b/c
# the trainer will raise an exception since it can't save a BetterTransformer wrapped model
control.should_save = False
return control

View File

@@ -1,42 +1,38 @@
"""Module containing data utilities"""
import logging
from hashlib import md5
from pathlib import Path
from typing import Union
from typing import List, Tuple, Union
from datasets import (
load_from_disk,
load_dataset,
IterableDataset,
Dataset,
concatenate_datasets,
DatasetDict,
)
import torch
from datasets import Dataset, DatasetDict, IterableDataset, load_dataset, load_from_disk
from huggingface_hub import hf_hub_download
from transformers import PreTrainedTokenizerBase
from axolotl.datasets import TokenizedPromptDataset, ConstantLengthDataset
from axolotl.datasets import ConstantLengthDataset, TokenizedPromptDataset
from axolotl.prompt_strategies import load
from axolotl.prompt_tokenizers import (
AlpacaPromptTokenizingStrategy,
GPTeacherPromptTokenizingStrategy,
OpenAssistantPromptTokenizingStrategy,
AlpacaReflectionPTStrategy,
ShareGPTPromptTokenizingStrategy,
JeopardyPromptTokenizingStrategy,
CompletionPromptTokenizingStrategy,
AlpacaMultipleChoicePromptTokenizingStrategy,
AlpacaPromptTokenizingStrategy,
AlpacaReflectionPTStrategy,
CompletionPromptTokenizingStrategy,
GPTeacherPromptTokenizingStrategy,
JeopardyPromptTokenizingStrategy,
OpenAssistantPromptTokenizingStrategy,
ShareGPTPromptTokenizingStrategy,
SummarizeTLDRPromptTokenizingStrategy,
)
from axolotl.prompters import (
AlpacaPrompter,
CompletionPrompter,
GPTeacherPrompter,
JeopardyPrompter,
MultipleChoiceConcisePrompter,
MultipleChoiceExplainPrompter,
ReflectAlpacaPrompter,
ShareGPTPrompter,
JeopardyPrompter,
CompletionPrompter,
MultipleChoiceExplainPrompter,
SummarizeTLDRPrompter,
MultipleChoiceConcisePrompter,
)
@@ -45,11 +41,13 @@ def load_tokenized_prepared_datasets(
) -> DatasetDict:
tokenizer_name = tokenizer.__class__.__name__
ds_hash = str(
md5(
md5( # nosec
(
str(cfg.sequence_len)
+ "@"
+ "|".join(sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets]))
+ "|".join(
sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets])
)
+ "|"
+ tokenizer_name
).encode("utf-8")
@@ -65,10 +63,11 @@ def load_tokenized_prepared_datasets(
try:
if cfg.push_dataset_to_hub:
dataset = load_dataset(
f"{cfg.push_dataset_to_hub}/{ds_hash}", use_auth_token=use_auth_token
f"{cfg.push_dataset_to_hub}/{ds_hash}",
use_auth_token=use_auth_token,
)
dataset = dataset["train"]
except:
except Exception: # pylint: disable=broad-except # nosec
pass
if dataset:
@@ -81,43 +80,59 @@ def load_tokenized_prepared_datasets(
logging.info(f"Unable to find prepared dataset in {prepared_ds_path}")
logging.info("Loading raw datasets...")
datasets = []
# pylint: disable=invalid-name
for d in cfg.datasets:
ds: Union[Dataset, DatasetDict] = None
ds_from_hub = False
try:
load_dataset(d.path, streaming=True, use_auth_token=use_auth_token)
load_dataset(
d.path,
streaming=True,
use_auth_token=use_auth_token,
)
ds_from_hub = True
except FileNotFoundError:
pass
# prefer local dataset, even if hub exists
if Path(d.path).exists():
ds: Dataset = load_dataset(
"json", data_files=d.path, streaming=False, split=None
ds = load_dataset(
"json",
data_files=d.path,
streaming=False,
split=None,
)
elif ds_from_hub:
if d.data_files:
ds: Dataset = load_dataset(
ds = load_dataset(
d.path,
streaming=False,
data_files=d.data_files,
use_auth_token=use_auth_token,
)
else:
ds: Dataset = load_dataset(d.path, streaming=False, use_auth_token=use_auth_token)
ds = load_dataset(
d.path,
streaming=False,
use_auth_token=use_auth_token,
)
else:
fp = hf_hub_download(
repo_id=d.path, repo_type="dataset", filename=d.data_files
repo_id=d.path,
repo_type="dataset",
filename=d.data_files,
)
ds: Dataset = load_dataset("json", data_files=fp, streaming=False, split=None)
ds = load_dataset("json", data_files=fp, streaming=False, split=None)
if not ds:
raise Exception("unhandled dataset load")
raise ValueError("unhandled dataset load")
# support for using a subset of the data
if d.shards:
if "train" in ds:
ds: DatasetDict = ds.shuffle(seed=42)["train"].shard(num_shards=d.shards, index=0)
ds = ds.shuffle(seed=42)["train"].shard(
num_shards=d.shards, index=0
)
else:
ds: Dataset = ds.shuffle(seed=42).shard(num_shards=d.shards, index=0)
ds = ds.shuffle(seed=42).shard(num_shards=d.shards, index=0)
d_type = d.type
d_type_split = d_type.split(":")
d_base_type = d_type_split[0]
@@ -219,11 +234,12 @@ def load_tokenized_prepared_datasets(
datasets.append(ds_wrapper)
else:
logging.error(f"unhandled prompt tokenization strategy: {d.type}")
raise ValueError(f"unhandled prompt tokenization strategy: {d.type}")
logging.info("tokenizing, merging, and shuffling master dataset")
samples = []
samples: List[int] = []
for d in datasets:
samples = samples + [i for i in d]
samples = samples + list(d)
dataset = Dataset.from_list(samples).shuffle(seed=42)
if cfg.local_rank == 0:
logging.info(
@@ -242,8 +258,10 @@ def load_tokenized_prepared_datasets(
def load_prepare_datasets(
tokenizer: PreTrainedTokenizerBase, cfg, default_dataset_prepared_path
) -> (Dataset, Dataset):
tokenizer: PreTrainedTokenizerBase,
cfg,
default_dataset_prepared_path,
) -> Tuple[Dataset, Dataset]:
max_packed_sequence_len = (
cfg.max_packed_sequence_len if cfg.max_packed_sequence_len else cfg.sequence_len
)
@@ -256,13 +274,15 @@ def load_prepare_datasets(
# see if we can go ahead and load the stacked dataset
seed = f"@{str(cfg.seed)}" if cfg.seed else ""
ds_hash = str(
md5(
md5( # nosec
(
str(cfg.sequence_len)
+ "@"
+ str(max_packed_sequence_len)
+ seed
+ "|".join(sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets]))
+ "|".join(
sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets])
)
+ "|"
+ tokenizer_name
).encode("utf-8")
@@ -282,10 +302,11 @@ def load_prepare_datasets(
f"Checking for packed prepared dataset from hub... {cfg.push_dataset_to_hub}/{ds_hash}"
)
dataset = load_dataset(
f"{cfg.push_dataset_to_hub}/{ds_hash}", use_auth_token=use_auth_token
f"{cfg.push_dataset_to_hub}/{ds_hash}",
use_auth_token=use_auth_token,
)
dataset = dataset["train"]
except:
except Exception: # pylint: disable=broad-except # nosec
pass
if dataset:
@@ -319,7 +340,7 @@ def load_prepare_datasets(
logging.info(
f"packing master dataset to len: {cfg.max_packed_sequence_len}"
)
dataset = Dataset.from_list([_ for _ in constant_len_dataset])
dataset = Dataset.from_list(list(constant_len_dataset))
# filter out bad data
dataset = Dataset.from_list(
@@ -343,7 +364,8 @@ def load_prepare_datasets(
f"Saving packed prepared dataset with push_to_hub... {cfg.push_dataset_to_hub}/{ds_hash}"
)
dataset.push_to_hub(
f"{cfg.push_dataset_to_hub}/{ds_hash}", private=True
f"{cfg.push_dataset_to_hub}/{ds_hash}",
private=True,
)
else:
dataset = load_tokenized_prepared_datasets(
@@ -355,11 +377,47 @@ def load_prepare_datasets(
f"Using index #{cfg.dataset_shard_idx} of {cfg.dataset_shard_num} shards"
)
dataset = dataset.shard(
num_shards=cfg.dataset_shard_num, index=cfg.dataset_shard_idx
num_shards=cfg.dataset_shard_num,
index=cfg.dataset_shard_idx,
)
dataset = dataset.train_test_split(test_size=cfg.val_set_size, shuffle=False)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]
if cfg.val_set_size:
dataset = dataset.train_test_split(test_size=cfg.val_set_size, shuffle=False)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]
else:
train_dataset = dataset
eval_dataset = None
return train_dataset, eval_dataset
class PretrainingDatasetWrapper(IterableDataset):
"""
Wrapper for pretraining dataset that avoids loading the dataset into memory
"""
def __init__(self, tokenizer, dataset_path, max_tokens=2048):
self.tokenizer = tokenizer
self.dataset_path = dataset_path
self.max_tokens = max_tokens
def __iter__(self):
buffer = []
for sample in load_dataset(
self.dataset_path,
)["train"].shuffle():
buffer += self.tokenizer(sample["text"])["input_ids"]
buffer += [self.tokenizer.eos_token_id]
while len(buffer) > self.max_tokens:
input_ids = torch.tensor(buffer[: self.max_tokens])
yield {
"input_ids": input_ids,
"attention_mask": torch.ones(input_ids.size()),
"labels": input_ids,
}
buffer = buffer[self.max_tokens :]
def load_pretraining_dataset(path, tokenizer, max_tokens=2048):
return PretrainingDatasetWrapper(tokenizer, path, max_tokens=max_tokens)

View File

@@ -1,3 +1,5 @@
"""Module containing the DictDefault class"""
from addict import Dict

View File

@@ -1,26 +1,28 @@
"""Module for models and model loading"""
import logging
import math
import os
from pathlib import Path
from typing import Optional, Tuple, TYPE_CHECKING
from typing import TYPE_CHECKING, Optional, Tuple # noqa: F401
import bitsandbytes as bnb
import torch
import transformers
from optimum.bettertransformer import BetterTransformer
from transformers import PreTrainedModel # noqa: F401
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
PreTrainedModel,
AutoConfig,
BitsAndBytesConfig,
LlamaConfig,
)
try:
from transformers import (
LlamaForCausalLM,
LlamaTokenizer,
)
except:
from transformers import LlamaForCausalLM
except ImportError:
logging.warning(
"This version of transformers does not support Llama. Consider upgrading."
)
@@ -28,24 +30,24 @@ except:
from axolotl.prompt_tokenizers import LLAMA_DEFAULT_PAD_TOKEN
if TYPE_CHECKING:
from peft import PeftModel, PeftConfig
from axolotl.utils.dict import DictDefault
from transformers import PreTrainedTokenizer
from peft import PeftConfig # noqa: F401
from axolotl.utils.dict import DictDefault # noqa: F401
def load_tokenizer(
base_model_config,
tokenizer_config,
tokenizer_type,
cfg,
):
if tokenizer_type:
tokenizer = getattr(transformers, tokenizer_type).from_pretrained(
base_model_config,
tokenizer_config,
trust_remote_code=cfg.trust_remote_code or False,
)
else:
tokenizer = AutoTokenizer.from_pretrained(
base_model_config,
tokenizer_config,
trust_remote_code=cfg.trust_remote_code or False,
)
@@ -54,7 +56,10 @@ def load_tokenizer(
logging.debug(f"PAD: {tokenizer.pad_token_id} / {tokenizer.pad_token}")
logging.debug(f"UNK: {tokenizer.unk_token_id} / {tokenizer.unk_token}")
if tokenizer.__class__.__name__ in ["LlamaTokenizer", "LlamaTokenizerFast"]:
if tokenizer.__class__.__name__ in [
"LlamaTokenizer",
"LlamaTokenizerFast",
]:
tokenizer.pad_token = LLAMA_DEFAULT_PAD_TOKEN
if tokenizer.__class__.__name__ == "GPTNeoXTokenizerFast":
@@ -62,8 +67,8 @@ def load_tokenizer(
os.environ["TOKENIZERS_PARALLELISM"] = "false"
if cfg.special_tokens:
for k, v in cfg.special_tokens.items():
tokenizer.add_special_tokens({k: v})
for k, val in cfg.special_tokens.items():
tokenizer.add_special_tokens({k: val})
if cfg.tokens:
tokenizer.add_tokens(list(cfg.tokens))
@@ -79,7 +84,10 @@ def load_model(
adapter="lora",
inference=False,
):
# type: (str, str, str, str, DictDefault, Optional[str], bool) -> Tuple[PreTrainedModel, PreTrainedTokenizer, Optional[PeftConfig]]
# type: (str, str, str, str, DictDefault, Optional[str], bool) -> Tuple[PreTrainedModel, Optional[PeftConfig]]
"""
Load a model from a base model and a model type.
"""
# TODO refactor as a kwarg
load_in_8bit = cfg.load_in_8bit
@@ -94,16 +102,23 @@ def load_model(
logging.info("patching with flash attention")
replace_llama_attn_with_flash_attn()
elif is_llama_derived_model and cfg.xformers_attention:
from alpaca_lora_4bit.monkeypatch.llama_attn_hijack_xformers import (
from axolotl.monkeypatch.llama_attn_hijack_xformers import (
hijack_llama_attention,
)
logging.info("patching with xformers attention")
hijack_llama_attention()
elif is_llama_derived_model and cfg.sdp_attention:
from axolotl.monkeypatch.llama_attn_hijack_xformers import (
hijack_llama_sdp_attention,
)
if cfg.bf16:
logging.info("patching with sdp attention")
hijack_llama_sdp_attention()
if cfg.bf16 or cfg.bfloat16:
torch_dtype = torch.bfloat16
elif cfg.load_in_8bit or cfg.fp16:
elif cfg.load_in_8bit or cfg.fp16 or cfg.float16:
torch_dtype = torch.float16
else:
torch_dtype = torch.float32
@@ -115,9 +130,9 @@ def load_model(
replace_peft_model_with_int4_lora_model()
from peft import prepare_model_for_int8_training
except Exception as e:
logging.exception(e)
raise e
except Exception as err:
logging.exception(err)
raise err
model_kwargs = {}
if cfg.adapter == "qlora" and cfg.load_in_4bit:
@@ -155,7 +170,7 @@ def load_model(
"unable to find a cached model file, this will likely fail..."
)
model_path = str(cache_model_path)
except:
except Exception: # pylint: disable=broad-exception-caught
model_path = cfg.base_model
model, _ = load_llama_model_4bit_low_ram(
base_model_config if base_model_config else base_model,
@@ -169,8 +184,10 @@ def load_model(
)
load_in_8bit = False
elif is_llama_derived_model and "LlamaForCausalLM" in globals():
config = LlamaConfig.from_pretrained(base_model_config)
model = LlamaForCausalLM.from_pretrained(
base_model,
config=config,
load_in_8bit=cfg.load_in_8bit and cfg.adapter is not None,
load_in_4bit=cfg.load_in_4bit and cfg.adapter is not None,
torch_dtype=torch_dtype,
@@ -210,13 +227,13 @@ def load_model(
load_in_4bit=cfg.load_in_4bit and cfg.adapter is not None,
torch_dtype=torch_dtype,
device_map=cfg.device_map,
trust_remote_code=True if cfg.trust_remote_code is True else False,
trust_remote_code=cfg.trust_remote_code or False,
**model_kwargs,
)
else:
config = AutoConfig.from_pretrained(
base_model,
trust_remote_code=True if cfg.trust_remote_code is True else False,
trust_remote_code=cfg.trust_remote_code or False,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
@@ -225,30 +242,35 @@ def load_model(
load_in_4bit=cfg.load_in_4bit and cfg.adapter is not None,
torch_dtype=torch_dtype,
device_map=cfg.device_map,
trust_remote_code=True if cfg.trust_remote_code is True else False,
trust_remote_code=cfg.trust_remote_code or False,
**model_kwargs,
)
except Exception as e:
except Exception as err: # pylint: disable=broad-exception-caught
logging.error(
"Exception raised attempting to load model, retrying with AutoModelForCausalLM"
)
logging.exception(e)
logging.exception(err)
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_8bit=cfg.load_in_8bit and cfg.adapter is not None,
torch_dtype=torch_dtype,
device_map=cfg.device_map,
trust_remote_code=True if cfg.trust_remote_code is True else False,
trust_remote_code=cfg.trust_remote_code or False,
**model_kwargs,
)
embeddings_len = math.ceil(len(tokenizer) / 32) * 32
model.resize_token_embeddings(embeddings_len)
if (
((cfg.adapter == "lora" and load_in_8bit) or cfg.adapter == "qlora")
and not cfg.gptq
and (load_in_8bit or cfg.load_in_4bit)
if cfg.sequence_len >= model.config.max_position_embeddings:
logging.warning(
f"increasing model.config.max_position_embeddings to {cfg.sequence_len}"
)
model.config.max_position_embeddings = cfg.sequence_len
if not cfg.gptq and (
(cfg.adapter == "lora" and load_in_8bit)
or (cfg.adapter == "qlora" and cfg.load_in_4bit)
):
logging.info("converting PEFT model w/ prepare_model_for_int8_training")
model = prepare_model_for_int8_training(model)
@@ -261,14 +283,14 @@ def load_model(
if cfg.gptq:
# Scales to half
logging.info("Fitting 4bit scales and zeros to half")
for n, m in model.named_modules():
if "Autograd4bitQuantLinear" in str(type(m)) or "Linear4bitLt" in str(
type(m)
for _, module in model.named_modules():
if "Autograd4bitQuantLinear" in str(type(module)) or "Linear4bitLt" in str(
type(module)
):
if hasattr(m, "is_v1_model") and m.is_v1_model:
m.zeros = m.zeros.half()
m.scales = m.scales.half()
m.bias = m.bias.half()
if hasattr(module, "is_v1_model") and module.is_v1_model:
module.zeros = module.zeros.half()
module.scales = module.scales.half()
module.bias = module.bias.half()
if (
torch.cuda.device_count() > 1
@@ -278,8 +300,8 @@ def load_model(
# llama is PROBABLY model parallelizable, but the default isn't that it is
# so let's only set it for the 4bit, see
# https://github.com/johnsmith0031/alpaca_lora_4bit/blob/08b3fca4a4a9e0d3945be1bab4529f100a428636/finetune.py#L130-L133
setattr(model, 'is_parallelizable', True)
setattr(model, 'model_parallel', True)
setattr(model, "is_parallelizable", True)
setattr(model, "model_parallel", True)
requires_grad = []
for name, param in model.named_parameters(recurse=True):
@@ -289,6 +311,9 @@ def load_model(
logging.warning("there are no parameters that require gradient updates")
model.config.use_cache = False
if cfg.flash_optimum:
model = BetterTransformer.transform(model)
# TODO resume_from_checkpoint handling
return model, lora_config
@@ -308,11 +333,7 @@ def load_adapter(model, cfg, adapter):
def load_llama_adapter(model, cfg):
# type: (PreTrainedModel, DictDefault) -> Tuple[PreTrainedModel, Optional[PeftConfig]]
from peft import (
AdaptionPromptConfig,
get_peft_model,
PeftModel,
)
from peft import AdaptionPromptConfig, PeftModel, get_peft_model
peft_config = AdaptionPromptConfig(
adapter_layers=cfg.peft_adapter.layers, # layers (L)
@@ -357,11 +378,7 @@ def find_all_linear_names(bits, model):
def load_lora(model, cfg):
# type: (PreTrainedModel, DictDefault) -> Tuple[PreTrainedModel, Optional[PeftConfig]]
from peft import (
LoraConfig,
get_peft_model,
PeftModel,
)
from peft import LoraConfig, PeftModel, get_peft_model
lora_target_modules = list(cfg.lora_target_modules or [])

View File

@@ -1,7 +1,13 @@
"""Module for custom LRScheduler class"""
from torch.optim.lr_scheduler import LRScheduler
class InterpolatingLogScheduler(LRScheduler):
"""
A scheduler that interpolates learning rates in a logarithmic fashion
"""
def __init__(self, optimizer, num_steps, min_lr, max_lr, last_epoch=-1):
"""A scheduler that interpolates learning rates in a logarithmic fashion
@@ -19,7 +25,9 @@ class InterpolatingLogScheduler(LRScheduler):
self.num_steps = num_steps
self.min_lr = min_lr
self.max_lr = max_lr
self.q = (max_lr / min_lr) ** (1 / (num_steps - 1))
self.q = (max_lr / min_lr) ** ( # pylint: disable=invalid-name
1 / (num_steps - 1)
)
super().__init__(optimizer, last_epoch)
def get_lr(self):

View File

@@ -1,6 +1,10 @@
from termcolor import colored
"""Module for tokenization utilities"""
import logging
from termcolor import colored
def check_dataset_labels(dataset, tokenizer):
# the dataset is already shuffled, so let's just check the first 5 elements
@@ -17,7 +21,7 @@ def check_example_labels(example, tokenizer):
# You can compare the input_ids and labels element-wise
# Remember to ignore positions with IGNORE_TOKEN_ID (if you use it) or attention_mask equal to 0
colored_tokens = []
for i, (input_id, label_id, mask) in enumerate(
for _, (input_id, label_id, mask) in enumerate(
zip(input_ids, labels, attention_mask)
):
decoded_input_token = tokenizer.decode(input_id)

View File

@@ -1,8 +1,12 @@
"""Module containing the Trainer class and related functions"""
import importlib
import logging
import math
import os
import sys
from pathlib import Path
from typing import Optional
import bitsandbytes as bnb
import torch.cuda
@@ -12,17 +16,29 @@ from torch.optim.lr_scheduler import OneCycleLR
from transformers import EarlyStoppingCallback, Trainer
from transformers.trainer_pt_utils import get_parameter_names
from axolotl.utils.callbacks import (
SaveBetterTransformerModelCallback,
SavePeftModelCallback,
)
from axolotl.utils.schedulers import InterpolatingLogScheduler
from axolotl.utils.callbacks import SavePeftModelCallback
class OneCycleLRSchedulerTrainer(Trainer):
"""
Trainer subclass that uses the OneCycleLR scheduler
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.lr_scheduler = None
def create_scheduler(
self, num_training_steps: int, optimizer: torch.optim.Optimizer = None
self,
num_training_steps: int,
optimizer: Optional[torch.optim.Optimizer] = None,
):
optimizer = self.optimizer if optimizer is None else optimizer
num_warmup_steps = self.args.get_warmup_steps(num_training_steps)
num_training_steps = num_training_steps
pct_start = num_warmup_steps / num_training_steps
self.lr_scheduler = OneCycleLR(
@@ -58,11 +74,11 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer):
training_arguments_kwargs["bf16_full_eval"] = True
else:
training_arguments_kwargs["bf16"] = cfg.bf16
training_arguments_kwargs["fp16"] = True if cfg.fp16 and not cfg.bf16 else False
training_arguments_kwargs["fp16"] = (cfg.fp16 and not cfg.bf16) or False
training_arguments_kwargs["tf32"] = cfg.tf32
training_arguments_kwargs["warmup_steps"] = warmup_steps
training_arguments_kwargs["logging_steps"] = logging_steps
if cfg.gradient_checkpointing is not None:
if cfg.gradient_checkpointing:
if cfg.gptq:
from alpaca_lora_4bit.gradient_checkpointing import (
apply_gradient_checkpointing,
@@ -112,13 +128,14 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer):
save_steps=save_steps,
output_dir=cfg.output_dir,
save_total_limit=3,
load_best_model_at_end=True
if cfg.load_best_model_at_end is not False # if explicitly set to False, it should be resort to False
and cfg.val_set_size > 0
and save_steps is not None
and save_steps % eval_steps == 0
and cfg.load_in_8bit is not True
else False,
load_best_model_at_end=(
cfg.load_best_model_at_end is not False
and cfg.val_set_size > 0
and save_steps
and save_steps % eval_steps == 0
and cfg.load_in_8bit is not True
)
or False,
ddp_find_unused_parameters=False if cfg.ddp else None,
group_by_length=cfg.group_by_length,
report_to="wandb" if cfg.use_wandb else None,
@@ -140,7 +157,7 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer):
if (
cfg.optimizer == "adamw_bnb_8bit"
and not cfg.gptq
and not "deepspeed" in training_arguments_kwargs
and "deepspeed" not in training_arguments_kwargs
and not cfg.fsdp
):
decay_parameters = get_parameter_names(model, [nn.LayerNorm])
@@ -206,9 +223,16 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer):
)
callbacks.append(early_stop_cb)
if cfg.local_rank == 0 and cfg.adapter in ["lora", "qlora"]: # only save in rank 0
if cfg.local_rank == 0 and cfg.adapter in [
"lora",
"qlora",
]: # only save in rank 0
callbacks.append(SavePeftModelCallback)
if hasattr(model, "use_bettertransformer") and model.use_bettertransformer is True:
logging.info("Setting up SaveBetterTransformerModelCallback.")
callbacks.append(SaveBetterTransformerModelCallback)
data_collator_kwargs = {
"padding": True,
}

View File

@@ -1,7 +1,15 @@
"""Module for validating config files"""
import logging
import torch
def validate_config(cfg):
if cfg.gradient_accumulation_steps and cfg.batch_size:
raise ValueError(
"please set only one of gradient_accumulation_steps or batch_size"
)
if cfg.load_4bit:
raise ValueError(
"cfg.load_4bit parameter has been deprecated and replaced by cfg.gptq"
@@ -38,9 +46,35 @@ def validate_config(cfg):
)
if cfg.push_dataset_to_hub and cfg.hf_use_auth_token is not True:
raise ValueError("Require cfg.hf_use_auth_token to be True for push_dataset_to_hub")
raise ValueError(
"Require cfg.hf_use_auth_token to be True for push_dataset_to_hub"
)
if cfg.flash_optimum is True:
if cfg.adapter:
logging.warning(
"BetterTransformers probably doesn't work with PEFT adapters"
)
if cfg.fp16 or cfg.bf16:
raise ValueError("AMP is not supported with BetterTransformer")
if cfg.float16 is not True and cfg.bloat16 is not True:
logging.warning(
"You should probably set bfloat16 or float16 to true to "
"load the model in float16 for BetterTransformers"
)
if int(torch.__version__.split(".")[0]) < 2:
logging.warning("torch>=2.0.0 required")
raise ValueError(
f"flash_optimum for BetterTransformers may not be used with {torch.__version__}"
)
# TODO
# MPT 7b
# https://github.com/facebookresearch/bitsandbytes/issues/25
# no 8bit adamw w bf16
# no 8bit adaAmw w bf16
# GPT-NeoX
# evals broken when extending context len
# File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 162, in forward attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
# File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/optimum/bettertransformer/models/attention.py", line 74, in gpt2_wrapped_scaled_dot_product
# attention_mask = causal_mask + attention_mask
# RuntimeError: The size of tensor a (2048) must match the size of tensor b (8132) at non-singleton dimension 3

View File

@@ -1,3 +1,5 @@
"""Module for wandb utilities"""
import os

12
tests/fixtures/alpaca/alpaca.json vendored Normal file
View File

@@ -0,0 +1,12 @@
[
{
"instruction": "You will be given a series of words. Output these words in reverse order, with each word on its own line.",
"input": "Words: ['Hello', 'world'].",
"output": "['world', 'Hello']"
},
{
"instruction": "In this task, you're given a short description of an event. Your job is to order the steps involved in the event from first to last. Note that there may be multiple correct answers for each event.",
"input": "Description: A man walks into a bar and orders a drink. He pays for his drink and leaves the bar.",
"output": "1. The man walks into the bar.\n2. He orders a drink.\n3. He pays for his drink.\n4. He leaves the bar."
}
]

File diff suppressed because one or more lines are too long

View File

@@ -1,3 +1,6 @@
"""Module for testing DictDefault class"""
import unittest
import pytest
@@ -6,6 +9,10 @@ from axolotl.utils.dict import DictDefault
class DictDefaultTest(unittest.TestCase):
"""
Test DictDefault class
"""
def test_dict_default(self):
cfg = DictDefault(
{
@@ -41,7 +48,9 @@ class DictDefaultTest(unittest.TestCase):
}
)
cfg = cfg | DictDefault({"key_a": {"key_b": "value_b"}, "key_f": "value_g"})
cfg = cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{"key_a": {"key_b": "value_b"}, "key_f": "value_g"}
)
assert (
cfg.key_a.key_b == "value_b"
@@ -73,7 +82,7 @@ class DictDefaultTest(unittest.TestCase):
AttributeError,
match=r"'NoneType' object has no attribute 'another_random_key'",
):
cfg.random_key.another_random_key
cfg.random_key.another_random_key = "value"
def test_dict_shorthand_assignment(self):
"""

View File

@@ -0,0 +1,65 @@
"""Module for testing dataset sequence packing"""
import unittest
from pathlib import Path
from datasets import Dataset, load_dataset
from transformers import AutoTokenizer
from axolotl.datasets import ConstantLengthDataset, TokenizedPromptDataset
from axolotl.prompt_tokenizers import AlpacaPromptTokenizingStrategy
from axolotl.prompters import AlpacaPrompter
class TestPacking(unittest.TestCase):
"""
Test class for packing dataset sequences
"""
def setUp(self) -> None:
# pylint: disable=duplicate-code
self.tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
self.tokenizer.add_special_tokens(
{
"bos_token": "<s>",
"eos_token": "</s>",
"unk_token": "<unk>",
}
)
def test_resets_attention(self):
prompter = AlpacaPrompter("chat")
strat = AlpacaPromptTokenizingStrategy(
prompter,
self.tokenizer,
False,
2048,
)
dateset = load_dataset(
"json",
data_files=str(Path(__file__).parent / "fixtures/alpaca/alpaca.json"),
)["train"]
dataset = Dataset.from_list(list(TokenizedPromptDataset(strat, dateset)))
constant_len_dataset = ConstantLengthDataset(
self.tokenizer,
[dataset],
seq_length=2048,
)
packed_dataset = Dataset.from_list(list(constant_len_dataset))
example = packed_dataset[0]
next_bos_index = (
example["input_ids"][1:].index(self.tokenizer.bos_token_id) + 1
) # add one since we sliced
# first example doesn't have mask reset
assert example["input_ids"][0] == self.tokenizer.bos_token_id
assert example["attention_mask"][0] == 1
# but subsequent one does
assert example["input_ids"][next_bos_index] == self.tokenizer.bos_token_id
assert example["attention_mask"][next_bos_index] == 0
if __name__ == "__main__":
unittest.main()

View File

@@ -1,3 +1,4 @@
"""Module for testing prompt tokenizers."""
import json
import logging
import unittest
@@ -12,7 +13,12 @@ logging.basicConfig(level="INFO")
class TestPromptTokenizationStrategies(unittest.TestCase):
"""
Test class for prompt tokenization strategies.
"""
def setUp(self) -> None:
# pylint: disable=duplicate-code
self.tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
self.tokenizer.add_special_tokens(
{
@@ -24,10 +30,15 @@ class TestPromptTokenizationStrategies(unittest.TestCase):
def test_sharegpt_integration(self):
print(Path(__file__).parent)
with open(Path(__file__).parent / "fixtures/conversation.json", "r") as fin:
with open(
Path(__file__).parent / "fixtures/conversation.json", encoding="utf-8"
) as fin:
data = fin.read()
conversation = json.loads(data)
with open(Path(__file__).parent / "fixtures/conversation.tokenized.json", "r") as fin:
with open(
Path(__file__).parent / "fixtures/conversation.tokenized.json",
encoding="utf-8",
) as fin:
data = fin.read()
tokenized_conversation = json.loads(data)
prompter = ShareGPTPrompter("chat")

View File

@@ -1,9 +1,15 @@
"""Module testing prompters"""
import unittest
from axolotl.prompters import AlpacaPrompter, PromptStyle
class AlpacaPrompterTest(unittest.TestCase):
"""
Test AlpacaPrompter
"""
def test_prompt_style_w_none(self):
prompter = AlpacaPrompter(prompt_style=None)
res = next(prompter.build_prompt("tell me a joke"))
@@ -11,8 +17,10 @@ class AlpacaPrompterTest(unittest.TestCase):
assert "### Instruction:" in res
def test_prompt_style_w_instruct(self):
prompter = AlpacaPrompter(prompt_style=PromptStyle.instruct.value)
res = next(prompter.build_prompt("tell me a joke about the following", "alpacas"))
prompter = AlpacaPrompter(prompt_style=PromptStyle.INSTRUCT.value)
res = next(
prompter.build_prompt("tell me a joke about the following", "alpacas")
)
assert "Below is an instruction" in res
assert "### Instruction:" in res
assert "### Input:" in res
@@ -29,8 +37,10 @@ class AlpacaPrompterTest(unittest.TestCase):
assert "ASSISTANT:" not in res
def test_prompt_style_w_chat(self):
prompter = AlpacaPrompter(prompt_style=PromptStyle.chat.value)
res = next(prompter.build_prompt("tell me a joke about the following", "alpacas"))
prompter = AlpacaPrompter(prompt_style=PromptStyle.CHAT.value)
res = next(
prompter.build_prompt("tell me a joke about the following", "alpacas")
)
assert "Below is an instruction" in res
assert "### Instruction:" not in res
assert "### Input:" not in res
@@ -45,5 +55,3 @@ class AlpacaPrompterTest(unittest.TestCase):
assert "### Response:" not in res
assert "USER:" in res
assert "ASSISTANT:" in res

View File

@@ -1,12 +1,18 @@
"""Module for testing the validation module"""
import unittest
import pytest
from axolotl.utils.validation import validate_config
from axolotl.utils.dict import DictDefault
from axolotl.utils.validation import validate_config
class ValidationTest(unittest.TestCase):
"""
Test the validation module
"""
def test_load_4bit_deprecate(self):
cfg = DictDefault(
{
@@ -24,7 +30,7 @@ class ValidationTest(unittest.TestCase):
}
)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"load_in_8bit": True,
}
@@ -33,7 +39,7 @@ class ValidationTest(unittest.TestCase):
with pytest.raises(ValueError, match=r".*8bit.*"):
validate_config(cfg)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"gptq": True,
}
@@ -42,7 +48,7 @@ class ValidationTest(unittest.TestCase):
with pytest.raises(ValueError, match=r".*gptq.*"):
validate_config(cfg)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"load_in_4bit": False,
}
@@ -51,7 +57,7 @@ class ValidationTest(unittest.TestCase):
with pytest.raises(ValueError, match=r".*4bit.*"):
validate_config(cfg)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"load_in_4bit": True,
}
@@ -67,7 +73,7 @@ class ValidationTest(unittest.TestCase):
}
)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"load_in_8bit": True,
}
@@ -76,7 +82,7 @@ class ValidationTest(unittest.TestCase):
with pytest.raises(ValueError, match=r".*8bit.*"):
validate_config(cfg)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"gptq": True,
}
@@ -85,7 +91,7 @@ class ValidationTest(unittest.TestCase):
with pytest.raises(ValueError, match=r".*gptq.*"):
validate_config(cfg)
cfg = base_cfg | DictDefault(
cfg = base_cfg | DictDefault( # pylint: disable=unsupported-binary-operation
{
"load_in_4bit": True,
}
@@ -112,3 +118,31 @@ class ValidationTest(unittest.TestCase):
)
validate_config(cfg)
def test_gradient_accumulations_or_batch_size(self):
cfg = DictDefault(
{
"gradient_accumulation_steps": 1,
"batch_size": 1,
}
)
with pytest.raises(
ValueError, match=r".*gradient_accumulation_steps or batch_size.*"
):
validate_config(cfg)
cfg = DictDefault(
{
"batch_size": 1,
}
)
validate_config(cfg)
cfg = DictDefault(
{
"gradient_accumulation_steps": 1,
}
)
validate_config(cfg)