Compare commits

...

33 Commits

Author SHA1 Message Date
Mads Henrichsen
272bced137 cpu offloading 2023-12-31 22:17:43 +01:00
Mads Henrichsen
c371d6b546 cpu offloading 2023-12-31 12:02:29 +01:00
Mads Henrichsen
d6273188f0 fft 2023-12-31 07:42:46 +01:00
Mads Henrichsen
72797b04a5 fix modules 2023-12-31 07:40:33 +01:00
Mads Henrichsen
de47bb5eb0 better lr 2023-12-30 22:36:50 +01:00
Mads Henrichsen
c04df54b4b new lr 2023-12-30 21:36:01 +01:00
Mads Henrichsen
e3716db386 small batch size 2023-12-30 13:20:45 +01:00
Mads Henrichsen
97943d8fc4 model revision 2023-12-30 12:55:17 +01:00
Mads Henrichsen
9d3f80cd40 disable packing 2023-12-30 12:51:03 +01:00
Mads Henrichsen
bfae79a634 trust 2023-12-30 12:47:50 +01:00
Mads Henrichsen
5a85ee16eb yayi2 2023-12-30 12:43:46 +01:00
Tazik Shahjahan
3678a6c41d Fix: bf16 support for inference (#981)
* Fix: bf16 torch dtype

* simplify casting to device and dtype

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-29 16:15:53 -06:00
mhenrichsen
f8ae59b0a8 Adds chat templates (#1022) 2023-12-29 15:44:23 -06:00
Hamel Husain
4f4d638b84 [WandB] Push axolotl config to top level wandb files (#1014) 2023-12-29 10:52:12 -08:00
Wing Lian
ba043a361e add ultrachat prompt strategies (#996) 2023-12-29 12:23:29 -06:00
NanoCode012
41353d2ea0 feat: expose bnb kwargs (#1018)
* feat: expose bnb kwargs

* chore: added examples and link per suggestion

* Uncomment defaults per suggestion for readability

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>

---------

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4 feat: remove need to add load_in* during merge (#1017) 2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53 [Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013) 2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da Update README.md (#1012) 2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4 remove landmark attn and xpos rope implementations (#1010) 2023-12-27 21:07:27 -08:00
Hamel Husain
85dd4d525b add config to model card (#1005)
* add config to model card

* rm space

* apply black formatting

* apply black formatting

* fix formatting

* check for cfg attribute

* add version

* add version

* put the config in a collapsible element

* put the config in a collapsible element
2023-12-27 21:25:33 -06:00
Kevin Sydney
384b817dc0 Set eval_sample_packing to false in mistral config.yaml (#1003)
Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.
2023-12-27 16:11:55 -08:00
Younes Belkada
db9094df0f FEAT: add tagging support to axolotl (#1004)
* add tagging support to axolotl

* chore: lint

* fix method w self

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-27 16:25:20 -06:00
Evan Griffiths
6ef46f8dca Add an example config for finetuning a 34B model on a 24GB GPU (#1000)
* Add an example config for finetuning a 34B model on a 24GB GPU

* Remore wandb project
2023-12-25 10:29:55 -08:00
Wing Lian
628b754824 set output_router_logits for mixtral config: (#995) 2023-12-22 12:57:02 -05:00
Wing Lian
37820f6540 support for cuda 12.1 (#989) 2023-12-22 11:08:22 -05:00
NanoCode012
7d4185ffcb chore: Update transformers to latest (#986) 2023-12-23 00:29:36 +09:00
mhenrichsen
93ebec1ac5 change val size (#992) 2023-12-22 16:18:16 +01:00
Hamel Husain
2e61dc3180 Add tests to Docker (#993) 2023-12-22 06:37:20 -08:00
NanoCode012
1ffa3866f2 Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)
* Feat: Auto add to modules_to_save when adding tokens

* fix: swap to error instead of warning

* feat: add check when special_tokens differ and add test
2023-12-22 21:49:07 +09:00
Hamel Husain
62ba1609b6 bump actions versions 2023-12-21 08:54:08 -08:00
Hamel Husain
7bbaac98f7 fix mistral prompt assembly (#982)
* fix mistral prompts

* fix spacing

* remove elif
2023-12-21 08:00:55 -08:00
Wing Lian
161bcb6517 Dockerfile torch fix (#987)
* add torch to requirements.txt at build time to force version to stick

* fix xformers check

* better handling of xformers based on installed torch version

* fix for ci w/o torch
2023-12-21 09:38:20 -05:00
29 changed files with 705 additions and 1491 deletions

View File

@@ -28,7 +28,12 @@ jobs:
- cuda: "118"
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.1.0
pytorch: 2.1.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 9.0+PTX"
- cuda: "121"
cuda_version: 12.1.0
python_version: "3.10"
pytorch: 2.1.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 9.0+PTX"
steps:
- name: Checkout

View File

@@ -27,38 +27,56 @@ jobs:
- cuda: 118
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.1.0
pytorch: 2.1.1
axolotl_extras:
- cuda: 121
cuda_version: 12.1.0
python_version: "3.10"
pytorch: 2.1.1
axolotl_extras:
runs-on: [self-hosted, gpu, docker]
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Docker metadata
id: metadata
uses: docker/metadata-action@v3
uses: docker/metadata-action@v5
with:
images: winglian/axolotl
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v4
# guidance for testing before pushing: https://docs.docker.com/build/ci/github-actions/test-before-push/
- name: Build and export to Docker
uses: docker/build-push-action@v5
with:
context: .
load: true
build-args: |
BASE_TAG=${{ github.ref_name }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
CUDA=${{ matrix.cuda }}
PYTORCH_VERSION=${{ matrix.pytorch }}
file: ./docker/Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: |
${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
labels: ${{ steps.metadata.outputs.labels }}
- name: Unit Tests
run: |
docker run --rm ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }} pytest --ignore=tests/e2e/ /workspace/axolotl/tests/
- name: Push to Docker Hub
if: github.event_name != 'pull_request'
run: |
docker push ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
latest_tag=${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
if [ -n "$latest_tag" ]; then
docker push "$latest_tag"
fi
build-axolotl-runpod:
needs: build-axolotl
if: github.repository_owner == 'OpenAccess-AI-Collective'
@@ -80,26 +98,31 @@ jobs:
- cuda: 118
cuda_version: 11.8.0
python_version: "3.10"
pytorch: 2.1.0
pytorch: 2.1.1
axolotl_extras:
- cuda: 121
cuda_version: 12.1.0
python_version: "3.10"
pytorch: 2.1.1
axolotl_extras:
runs-on: [self-hosted, gpu, docker]
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Docker metadata
id: metadata
uses: docker/metadata-action@v3
uses: docker/metadata-action@v5
with:
images: winglian/axolotl-runpod
- name: Login to Docker Hub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v5
with:
context: .
build-args: |

View File

@@ -520,6 +520,14 @@ model_config:
type: # linear | dynamic
factor: # float
# optional overrides to the bnb 4bit quantization configuration
# https://huggingface.co/docs/transformers/main/main_classes/quantization#transformers.BitsAndBytesConfig
bnb_config_kwargs:
# These are default values
llm_int8_has_fp16_weight: false
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
# Whether you are training a 4-bit GPTQ quantized model
gptq: true
@@ -581,6 +589,9 @@ datasets:
# For `completion` datsets only, uses the provided field instead of `text` column
field:
# Saves the desired chat template to the tokenizer_config.json for easier inferencing
# Currently supports chatml and inst (mistral/mixtral)
chat_template: chatml
# Axolotl attempts to save the dataset as an arrow after packing the data together so
# subsequent training attempts load faster, relative path
dataset_prepared_path: data/last_run_prepared
@@ -672,6 +683,7 @@ relora_warmup_steps: # Number of per-restart warmup steps
relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
# wandb configuration if you're using it
# Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
wandb_project: # Your wandb project name
wandb_entity: # A wandb Team name if using a Team
@@ -798,11 +810,6 @@ flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
# Whether to use scaled-dot-product attention
# https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
sdp_attention:
# Landmark attention (only llama)
landmark_attention:
# xpos RoPE see https://github.com/kaiokendev/cutoff-len-is-context-len/blob/main/util/xpos_rope_llama_monkey_patch.py
# LLaMA only
xpos_rope:
# Resume from a specific checkpoint dir
resume_from_checkpoint:
@@ -969,6 +976,8 @@ fsdp_config:
##### Weights & Biases Logging
Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
- wandb options
```yaml
wandb_mode:
@@ -995,9 +1004,12 @@ tokens: # these are delimiters
When you include these tokens in your axolotl config, axolotl adds these tokens to the tokenizer's vocabulary.
### Inference
### Inference Playground
Pass the appropriate flag to the train command:
Axolotl allows you to load your model in an interactive terminal playground for quick experimentation.
The config file is the same config file used for training.
Pass the appropriate flag to the inference command, depending upon what kind of model was trained:
- Pretrained LORA:
```bash
@@ -1026,7 +1038,7 @@ Please use `--sample_packing False` if you have it on and receive the error simi
Add below flag to train command above
```bash
python3 -m axolotl.cli.merge_lora examples/your_config.yml --lora_model_dir="./completed-model" --load_in_8bit=False --load_in_4bit=False
python3 -m axolotl.cli.merge_lora examples/your_config.yml --lora_model_dir="./completed-model"
```
If you run out of CUDA memory, you can try to merge in system RAM with

47
deepspeed/zero3_cpu.json Normal file
View File

@@ -0,0 +1,47 @@
{
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 0,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 0,
"stage3_max_reuse_distance": 0,
"stage3_gather_16bit_weights_on_model_save": true
},
"bf16": {
"enabled": "auto"
},
"fp16": {
"enabled": "auto",
"auto_cast": false,
"loss_scale": 0,
"initial_scale_power": 32,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"gradient_accumulation_steps": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}

View File

@@ -19,13 +19,15 @@ RUN git clone --depth=1 https://github.com/OpenAccess-AI-Collective/axolotl.git
WORKDIR /workspace/axolotl
# If AXOLOTL_EXTRAS is set, append it in brackets
RUN sed -i "s/torch==.*/torch==$PYTORCH_VERSION/" requirements.txt
RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
pip install -e .[deepspeed,flash-attn,$AXOLOTL_EXTRAS]; \
else \
pip install -e .[deepspeed,flash-attn]; \
fi
# So we can test the Docker image
RUN pip install pytest
# fix so that git fetch/pull from remote works
RUN git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && \
git config --get remote.origin.fetch

View File

@@ -17,6 +17,7 @@ output_dir: ./out
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project:
wandb_entity:

View File

@@ -23,6 +23,9 @@ unfrozen_parameters:
# - model.layers.3[0-9]+.block_sparse_moe.gate.*
# - model.layers.3[0-9]+.block_sparse_moe.experts.*
model_config:
output_router_logits: true
adapter: qlora
lora_model_dir:

View File

@@ -11,7 +11,7 @@ datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
val_set_size: 0.1
output_dir: ./qlora-out
adapter: qlora

View File

@@ -0,0 +1,64 @@
base_model: models/yayi2-30b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_mistral_derived_model: false
trust_remote_code: true
model_revision: refs/pr/5
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./out
sequence_len: 2048
sample_packing: false
pad_to_sequence_len: false
eval_sample_packing: false
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000005
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed/zero3_cpu.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"

View File

@@ -0,0 +1,76 @@
base_model: wenge-research/yayi2-30b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_mistral_derived_model: false
trust_remote_code: true
model_revision: refs/pr/5
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 2048 # Fits in 40gb VRAM. Can easily do 4096 in A100 80 or a A6000
sample_packing: false
pad_to_sequence_len: false
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
wandb_project: yayi2
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0005
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"

View File

@@ -0,0 +1,5 @@
# Overview
This is an example of a Yi-34B-Chat configuration. It demonstrates that it is possible to finetune a 34B model on a GPU with 24GB of VRAM.
Tested on an RTX 4090 with `python -m axolotl.cli.train examples/mistral/qlora.yml`, a single epoch of finetuning on the alpaca dataset using qlora runs in 47 mins, using 97% of available memory.

View File

@@ -0,0 +1,76 @@
base_model: 01-ai/Yi-34B-Chat
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: false
is_llama_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 1024
bf16: true
fp16: false
tf32: false
flash_attention: true
special_tokens:
bos_token: "<|startoftext|>"
eos_token: "<|endoftext|>"
unk_token: "<unk>"
# Data
datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
warmup_steps: 10
# Iterations
num_epochs: 1
# Evaluation
val_set_size: 0.1
evals_per_epoch: 5
eval_table_size:
eval_table_max_new_tokens: 128
eval_sample_packing: false
eval_batch_size: 1
# LoRA
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
# Sampling
sample_packing: false
pad_to_sequence_len: false
# Batching
gradient_accumulation_steps: 4
micro_batch_size: 1
gradient_checkpointing: true
# wandb
wandb_project:
# Optimizer
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
# Misc
train_on_inputs: false
group_by_length: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
debug:
deepspeed:
weight_decay: 0
fsdp:
fsdp_config:

View File

@@ -2,7 +2,7 @@
auto-gptq==0.5.1
packaging
peft==0.6.0
transformers @ git+https://github.com/huggingface/transformers.git@ebfdb9ca62205279d5019ef1403877461b3b2da4
transformers==4.36.2
tokenizers==0.15.0
bitsandbytes>=0.41.1
accelerate==0.24.1

View File

@@ -1,5 +1,7 @@
"""setup.py for axolotl"""
from importlib.metadata import PackageNotFoundError, version
from setuptools import find_packages, setup
@@ -22,12 +24,13 @@ def parse_requirements():
# Handle standard packages
_install_requires.append(line)
# TODO(wing) remove once xformers release supports torch 2.1.0
if "torch==2.1.0" in _install_requires:
_install_requires.pop(_install_requires.index("xformers>=0.0.22"))
_install_requires.append(
"xformers @ git+https://github.com/facebookresearch/xformers.git@main"
)
try:
torch_version = version("torch")
if torch_version.startswith("2.1.1"):
_install_requires.pop(_install_requires.index("xformers==0.0.22"))
_install_requires.append("xformers==0.0.23")
except PackageNotFoundError:
pass
return _install_requires, _dependency_links

View File

@@ -103,15 +103,7 @@ def do_inference(
importlib.import_module("axolotl.prompters"), prompter
)
if cfg.landmark_attention:
from axolotl.monkeypatch.llama_landmark_attn import set_model_mem_id
set_model_mem_id(model, tokenizer)
model.set_mem_cache_args(
max_seq_len=255, mem_freq=50, top_k=5, max_cache_size=None
)
model = model.to(cfg.device)
model = model.to(cfg.device, dtype=cfg.torch_dtype)
while True:
print("=" * 80)
@@ -176,15 +168,7 @@ def do_inference_gradio(
importlib.import_module("axolotl.prompters"), prompter
)
if cfg.landmark_attention:
from axolotl.monkeypatch.llama_landmark_attn import set_model_mem_id
set_model_mem_id(model, tokenizer)
model.set_mem_cache_args(
max_seq_len=255, mem_freq=50, top_k=5, max_cache_size=None
)
model = model.to(cfg.device)
model = model.to(cfg.device, dtype=cfg.torch_dtype)
def generate(instruction):
if not instruction:

View File

@@ -18,7 +18,15 @@ def do_cli(config: Path = Path("examples/"), **kwargs):
return_remaining_strings=True
)
parsed_cli_args.merge_lora = True
parsed_cfg = load_cfg(config, merge_lora=True, **kwargs)
parsed_cfg = load_cfg(
config,
merge_lora=True,
load_in_8bit=False,
load_in_4bit=False,
flash_attention=False,
**kwargs
)
do_merge_lora(cfg=parsed_cfg, cli_args=parsed_cli_args)

View File

@@ -9,7 +9,7 @@ import math
import sys
from abc import abstractmethod
from dataclasses import dataclass, field
from functools import partial
from functools import wraps
from pathlib import Path
from typing import Optional
@@ -120,6 +120,7 @@ class AxolotlTrainer(Trainer):
"""
args = None # type: AxolotlTrainingArguments
tag_names = ["axolotl"]
def __init__(self, *args, num_epochs=1, bench_data_collator=None, **kwargs):
self.num_epochs = num_epochs
@@ -290,12 +291,41 @@ class AxolotlTrainer(Trainer):
# return (loss, outputs) if return_outputs else loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
def _sanitize_kwargs_for_tagging(self, tag_names, kwargs=None):
if isinstance(tag_names, str):
tag_names = [tag_names]
if kwargs is not None:
if "tags" not in kwargs:
kwargs["tags"] = tag_names
elif "tags" in kwargs and isinstance(kwargs["tags"], list):
kwargs["tags"].extend(tag_names)
elif "tags" in kwargs and isinstance(kwargs["tags"], str):
tag_names.append(kwargs["tags"])
kwargs["tags"] = tag_names
return kwargs
@wraps(Trainer.push_to_hub)
def push_to_hub(self, *args, **kwargs) -> str:
"""
Overwrite the `push_to_hub` method in order to force-add the tags when pushing the
model on the Hub. Please refer to `~transformers.Trainer.push_to_hub` for more details.
"""
kwargs = self._sanitize_kwargs_for_tagging(
tag_names=self.tag_names, kwargs=kwargs
)
return super().push_to_hub(*args, **kwargs)
class AxolotlMambaTrainer(AxolotlTrainer):
"""
Mamba specific trainer to handle loss calculation
"""
tag_names = ["axolotl", "mamba"]
def compute_loss(
self,
model,
@@ -322,6 +352,8 @@ class OneCycleLRSchedulerTrainer(AxolotlTrainer):
Trainer subclass that uses the OneCycleLR scheduler
"""
tag_names = ["axolotl", "onecycle"]
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.lr_scheduler = None
@@ -351,6 +383,8 @@ class ReLoRATrainer(AxolotlTrainer):
Trainer subclass that uses the OneCycleLR scheduler
"""
tag_names = ["axolotl", "relora"]
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.lr_scheduler = None
@@ -746,26 +780,6 @@ class HFCausalTrainerBuilder(TrainerBuilderBase):
# https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html
data_collator_kwargs["pad_to_multiple_of"] = 64
if self.cfg.is_llama_derived_model and self.cfg.landmark_attention:
from axolotl.monkeypatch.llama_landmark_attn import (
add_mem_tokens,
get_mem_id,
set_model_mem_id,
)
set_model_mem_id(self.model, self.tokenizer)
LOG.info("Adding landmark attention tokens to dataset")
for dataset in [self.train_dataset, self.eval_dataset]:
dataset = dataset.map(
partial(
add_mem_tokens, mem_freq=50, mem_id=get_mem_id(self.tokenizer)
),
batched=False,
num_proc=32,
)
trainer_cls = self._get_trainer_cls()
trainer_kwargs, trainer_cls = self.hook_pre_create_trainer(
trainer_kwargs, trainer_cls

View File

@@ -82,7 +82,7 @@ def get_turns( # pylint: disable=too-many-return-statements
else:
yield role + ":", ""
return
if self.sep_style == SeparatorStyle.LLAMA2:
if self.sep_style == SeparatorStyle.LLAMA2 and self.name != "mistral":
if self.system_message:
if self.messages:
# For llama, the system message is incorporated into the first human instruction
@@ -101,6 +101,28 @@ def get_turns( # pylint: disable=too-many-return-statements
else:
yield role, ""
return
if self.sep_style == SeparatorStyle.LLAMA2 and self.name == "mistral":
contains_sys_msg = False
if self.system_message:
contains_sys_msg = True
if self.messages:
# There is no clear guidance on how to handle system messages in Mistral so we just prepend it to the first human instruction seperated by a newline
first_role, first_msg = self.messages[0]
if first_role == self.roles[0]:
system_prompt = self.system_template.format(
system_message=" " + self.system_message
)
system_prompt += first_msg
self.messages.pop(0)
yield "", system_prompt
for i, (role, message) in enumerate(self.messages):
if message and i == 0 and not contains_sys_msg:
yield "", system_prompt.strip() + " " + message # if there is no system message, we need to make sure there is the a `<s> [INST]` at the beginning of the first instruction.
elif message:
yield role + " ", message
else:
yield role, ""
return
if self.sep_style == SeparatorStyle.CHATGLM:
# source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
# source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926

File diff suppressed because it is too large Load Diff

View File

@@ -1,94 +0,0 @@
# pylint: skip-file
"""
Copied from https://github.com/kaiokendev/cutoff-len-is-context-len/blob/main/util/xpos_rope_llama_monkey_patch.py
"""
import torch
import transformers
import transformers.models.llama.modeling_llama
from einops import rearrange
class XposRotaryEmbedding(torch.nn.Module):
def __init__(
self,
dim,
max_position_embeddings=2048,
base=10000,
device=None,
scale_base=2048,
use_xpos=True,
):
super().__init__()
self.max_seq_len_cached = max_position_embeddings
self.scale_base = scale_base
inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim))
t = torch.arange(self.max_seq_len_cached, device=device).type_as(inv_freq)
freqs = torch.einsum("i , j -> i j", t, inv_freq)
freqs = torch.cat((freqs, freqs), dim=-1)
self.register_buffer("inv_freq", inv_freq, persistent=False)
self.register_buffer("freqs_cached", freqs, persistent=False)
if not use_xpos:
self.register_buffer("scale", None)
self.register_buffer("scale_cached", torch.ones(1))
return
scale = (torch.arange(0, dim, 2) + 0.4 * dim) / (1.4 * dim)
power = (t - (self.max_seq_len_cached // 2)) / self.scale_base
scale_cached = scale ** rearrange(power, "n -> n 1")
scale_cached = torch.cat((scale_cached, scale_cached), dim=-1)
self.register_buffer("scale", scale, persistent=False)
self.register_buffer("scale_cached", scale_cached, persistent=False)
def forward(
self,
x,
seq_len,
):
if seq_len > self.max_seq_len_cached:
self.max_seq_len_cached = seq_len
t = torch.arange(self.max_seq_len_cached, device=x.device).type_as(
self.inv_freq
)
freqs = torch.einsum("i , j -> i j", t, self.inv_freq)
freqs = torch.cat((freqs, freqs), dim=-1).to(dtype=x.dtype)
self.register_buffer("freqs_cached", freqs)
if self.scale is None:
self.register_buffer(
"scale_cached", torch.ones(1, device=x.device).to(dtype=x.dtype)
)
return self.freqs_cached.to(dtype=x.dtype), self.scale_cached
power = (t - (seq_len // 2)) / self.scale_base
scale = self.scale ** rearrange(power, "n -> n 1")
scale = torch.cat((scale, scale), dim=-1).to(dtype=x.dtype)
self.register_buffer("scale_cached", scale)
return self.freqs_cached.to(dtype=x.dtype), self.scale_cached.to(dtype=x.dtype)
def rotate_half(x):
x1, x2 = x.chunk(2, dim=-1)
return torch.cat((-x2, x1), dim=-1)
def apply_rotary_pos_emb(q, k, freqs, scale=1, position_ids=None):
freqs = freqs[position_ids, :]
if scale.shape[-1] != 1:
scale = scale[position_ids, :]
q_embed = (q * freqs.cos() * scale) + (rotate_half(q) * freqs.sin() * scale)
k_embed = (k * freqs.cos() * 1 / scale) + (rotate_half(k) * freqs.sin() * 1 / scale)
return q_embed, k_embed
def replace_llama_rope_with_xpos_rope():
transformers.models.llama.modeling_llama.LlamaRotaryEmbedding = XposRotaryEmbedding
transformers.models.llama.modeling_llama.apply_rotary_pos_emb = apply_rotary_pos_emb

View File

@@ -39,6 +39,23 @@ def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
return strategy
def load_ultrachat(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
conversation = (
ds_cfg["conversation"] if ds_cfg and "conversation" in ds_cfg else None
)
strategy = UltrachatShareGPTPromptTokenizingStrategy(
ShareGPTPrompterV2(
conversation=conversation,
),
tokenizer,
cfg.train_on_inputs,
cfg.sequence_len,
)
if ds_cfg and "strict" in ds_cfg:
strategy.strict = ds_cfg["strict"]
return strategy
def load_role(tokenizer, cfg):
return SimpleRoleShareGPTPromptTokenizingStrategy(
ShareGPTPrompterV2(),
@@ -109,3 +126,17 @@ class GuanacoShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
{"from": role_map[t["role"]], "value": t["text"]} for t in conversations
]
return turns
class UltrachatShareGPTPromptTokenizingStrategy(SimpleShareGPTPromptTokenizingStrategy):
"""
sharegpt strategy that remaps ultrachat data to sharegpt format
"""
def get_conversation_thread(self, prompt):
conversations = prompt["messages"]
role_map = {"user": "human", "assistant": "gpt"}
turns = [
{"from": role_map[t["role"]], "value": t["content"]} for t in conversations
]
return turns

View File

@@ -12,6 +12,7 @@ import transformers.modelcard
from accelerate.logging import get_logger
from datasets import Dataset
from optimum.bettertransformer import BetterTransformer
from pkg_resources import get_distribution # type: ignore
from transformers.deepspeed import is_deepspeed_zero3_enabled
from axolotl.common.cli import TrainerCliArgs
@@ -115,6 +116,12 @@ def train(
badge_markdown = """[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)"""
transformers.modelcard.AUTOGENERATED_TRAINER_COMMENT += f"\n{badge_markdown}"
if getattr(cfg, "axolotl_config_path"):
raw_axolotl_cfg = Path(cfg.axolotl_config_path)
version = get_distribution("axolotl").version
if raw_axolotl_cfg.is_file():
transformers.modelcard.AUTOGENERATED_TRAINER_COMMENT += f"\n<details><summary>See axolotl config</summary>\n\naxolotl version: `{version}`\n```yaml\n{raw_axolotl_cfg.read_text(encoding='utf-8')}\n```\n\n</details><br>\n"
LOG.info("Starting trainer...")
if cfg.group_by_length:
LOG.info("hang tight... sorting dataset for group_by_length")

View File

@@ -4,6 +4,8 @@ from __future__ import annotations
import logging
import os
from shutil import copyfile
from tempfile import NamedTemporaryFile
from typing import TYPE_CHECKING, Dict, List
import evaluate
@@ -561,10 +563,15 @@ class SaveAxolotlConfigtoWandBCallback(TrainerCallback):
):
if is_main_process():
try:
artifact = wandb.Artifact(name="axolotl-config", type="config")
artifact.add_file(local_path=self.axolotl_config_path)
wandb.run.log_artifact(artifact)
LOG.info("Axolotl config has been saved to WandB as an artifact.")
# sync config to top level in run, cannot delete file right away because wandb schedules it to be synced even w/policy = 'now', so let OS delete it later.
with NamedTemporaryFile(
mode="w", delete=False, suffix=".yml", prefix="axolotl_config_"
) as temp_file:
copyfile(self.axolotl_config_path, temp_file.name)
wandb.save(temp_file.name)
LOG.info(
"The Axolotl config has been saved to the WandB run under files."
)
except (FileNotFoundError, ConnectionError) as err:
LOG.warning(f"Error while saving Axolotl config to WandB: {err}")
return control

View File

@@ -0,0 +1,29 @@
"""
This module provides functionality for selecting chat templates based on user choices.
These templates are used for formatting messages in a conversation.
"""
def chat_templates(user_choice: str):
"""
Finds the correct chat_template for the tokenizer_config.
Args:
user_choice (str): The user's choice of template.
Returns:
str: The chosen template string.
Raises:
ValueError: If the user_choice is not found in the templates.
"""
templates = {
"inst": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}", # I don't know what this one is called. Used by Mistral/Mixtral.
"chatml": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
}
if user_choice in templates:
return templates[user_choice]
raise ValueError(f"Template '{user_choice}' not found.")

View File

@@ -448,6 +448,20 @@ def validate_config(cfg):
if cfg.neftune_noise_alpha is not None and cfg.neftune_noise_alpha <= 0.0:
raise ValueError("neftune_noise_alpha must be > 0.0")
if (
cfg.adapter
and cfg.tokens
and (
not cfg.lora_modules_to_save
or not all(
x in cfg.lora_modules_to_save for x in ["embed_tokens", "lm_head"]
)
)
):
raise ValueError(
"lora_modules_to_save not properly set yet adding new tokens. Please add `embed_tokens` and `lm_head` to `lora_modules_to_save`."
)
# TODO
# MPT 7b
# https://github.com/facebookresearch/bitsandbytes/issues/25

View File

@@ -26,6 +26,7 @@ from transformers.deepspeed import is_deepspeed_zero3_enabled
from axolotl.models.mamba import fix_mamba_attn_for_loss
from axolotl.prompt_tokenizers import LLAMA_DEFAULT_EOS_TOKEN
from axolotl.utils.bench import log_gpu_memory_usage
from axolotl.utils.chat_templates import chat_templates
from axolotl.utils.dict import DictDefault
LOG = logging.getLogger("axolotl")
@@ -136,6 +137,23 @@ def load_tokenizer(cfg):
if cfg.special_tokens:
for k, val in cfg.special_tokens.items():
# check if new special token is not already in tokenizer and
# is adapter training to make sure lora_modules_to_save is set
if (
(getattr(tokenizer, k) is None or getattr(tokenizer, k) != val)
and cfg.adapter
and (
not cfg.lora_modules_to_save
or not all(
x in cfg.lora_modules_to_save
for x in ["embed_tokens", "lm_head"]
)
)
):
raise ValueError(
"Please set lora_modules_to_save to ['embed_tokens', 'lm_head'] when using an adapter and changing the special tokens."
)
tokenizer.add_special_tokens(
{k: AddedToken(val, rstrip=False, lstrip=False, normalized=False)}
)
@@ -169,6 +187,12 @@ def load_tokenizer(cfg):
LOG.debug(f"PAD: {tokenizer.pad_token_id} / {tokenizer.pad_token}")
LOG.debug(f"UNK: {tokenizer.unk_token_id} / {tokenizer.unk_token}")
if cfg.chat_template:
tokenizer.chat_template = chat_templates(cfg.chat_template)
else:
LOG.info(
"No Chat template selected. Consider adding a chat template for easier inference."
)
return tokenizer
@@ -230,17 +254,6 @@ def load_model(
LOG.info("patching with sdp attention")
hijack_llama_sdp_attention()
elif cfg.is_llama_derived_model and cfg.landmark_attention:
from axolotl.monkeypatch.llama_landmark_attn import (
MEM_TOKEN,
patch_llama_with_landmark_attn,
)
LOG.info("patching with landmark attention")
patch_llama_with_landmark_attn()
# Note: This might overwrite previous additional_special_tokens
tokenizer.add_special_tokens({"additional_special_tokens": [MEM_TOKEN]})
if cfg.is_mistral_derived_model and cfg.flash_attention and cfg.sample_packing:
from axolotl.monkeypatch.mistral_attn_hijack_flash import (
@@ -262,14 +275,6 @@ def load_model(
LOG.info("patching with flash attention")
replace_mixtral_attn_with_multipack_flash_attn()
if cfg.is_llama_derived_model and cfg.xpos_rope:
from axolotl.monkeypatch.xpos_rope_llama_monkey_patch import (
replace_llama_rope_with_xpos_rope,
)
LOG.info("patching with xpos rope")
replace_llama_rope_with_xpos_rope()
if (
cfg.is_llama_derived_model
and (cfg.max_packed_sequence_len or cfg.sample_packing)
@@ -303,13 +308,20 @@ def load_model(
**model_config.quantization_config
)
if cfg.adapter == "qlora" and cfg.load_in_4bit:
bnb_config = {
"load_in_4bit": True,
"llm_int8_threshold": 6.0,
"llm_int8_has_fp16_weight": False,
"bnb_4bit_compute_dtype": cfg.torch_dtype,
"bnb_4bit_use_double_quant": True,
"bnb_4bit_quant_type": "nf4",
}
if cfg.bnb_config_kwargs:
bnb_config.update(cfg.bnb_config_kwargs)
model_kwargs["quantization_config"] = BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=cfg.torch_dtype,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
**bnb_config,
)
# sample packing uses custom FA2 patch
if cfg.flash_attention:

View File

@@ -2,6 +2,7 @@
import json
import logging
import unittest
from copy import deepcopy
from pathlib import Path
from typing import Optional
@@ -25,6 +26,50 @@ from axolotl.prompters import AlpacaPrompter, PromptStyle, ShareGPTPrompterV2
LOG = logging.getLogger("axolotl")
test_data = {
"multi_turn_sys": {
"conversations": [
{"from": "system", "value": "lorem"},
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
{"from": "human", "value": "123"},
{"from": "gpt", "value": "sit"},
]
},
"single_turn_sys": {
"conversations": [
{"from": "system", "value": "lorem"},
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
]
},
"single_turn_no_sys": {
"conversations": [
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
]
},
"multi_turn_no_sys": {
"conversations": [
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
{"from": "human", "value": "123"},
{"from": "gpt", "value": "sit"},
]
},
}
def prompt_strat(conversation, tokenizer):
"Helper function to create a prompt strategy for testing."
prompter = ShareGPTPrompterV2(conversation=conversation)
return ShareGPTPromptTokenizingStrategy(
prompter,
tokenizer,
False,
2048,
)
class TestPromptTokenizationStrategies(unittest.TestCase):
"""
@@ -116,74 +161,68 @@ class TestPromptTokenizationStrategies(unittest.TestCase):
def test_sharegpt_llama(self):
"Make sure the sharegpt/llama is tokenized and formatted correctly."
prompter = ShareGPTPrompterV2(conversation="llama-2")
strat = ShareGPTPromptTokenizingStrategy(
prompter,
self.tokenizer,
False,
2048,
)
strat = prompt_strat("llama-2", self.tokenizer)
def tokenize(conv):
return strat.tokenize_prompt(conv)["input_ids"]
return strat.tokenize_prompt(deepcopy(conv))["input_ids"]
def decode(ids):
return strat.tokenizer.decode(ids)
# Multi-turn conversations
multi_turn_conv = {
"conversations": [
{"from": "system", "value": "lorem"},
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
{"from": "human", "value": "123"},
{"from": "gpt", "value": "sit"},
]
}
# fmt: off
mt_ids = tokenize(multi_turn_conv)
# System message, multi-turn conversations
mt_ids = tokenize(test_data['multi_turn_sys'])
assert decode(mt_ids) == '<s> [INST] <<SYS>>\nlorem\n<</SYS>>\n\nabc [/INST] ipsum</s><s> [INST] 123 [/INST] sit</s>'
assert mt_ids == [1, 518, 25580, 29962, 3532, 14816, 29903, 6778, 13, 29880, 3668, 13, 29966, 829, 14816, 29903, 6778, 13, 13, 10736, 518, 29914, 25580, 29962, 23421, 2, 1, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
# Single-turn conversations
single_turn_conv = {
"conversations": [
{"from": "system", "value": "lorem"},
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
]
}
st_ids = tokenize(single_turn_conv)
# System message, single-turn conversations
st_ids = tokenize(test_data['single_turn_sys'])
assert decode(st_ids) == '<s> [INST] <<SYS>>\nlorem\n<</SYS>>\n\nabc [/INST] ipsum</s>'
assert st_ids == [1, 518, 25580, 29962, 3532, 14816, 29903, 6778, 13, 29880, 3668, 13, 29966, 829, 14816, 29903, 6778, 13, 13, 10736, 518, 29914, 25580, 29962, 23421, 2]
# No system message, single-turn
no_sys_conv = {
"conversations": [
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
]
}
ns_ids = tokenize(no_sys_conv)
ns_ids = tokenize(test_data['single_turn_no_sys'])
assert decode(ns_ids) == '<s> [INST] abc [/INST] ipsum</s>'
assert ns_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2]
# No system message, multi-turn
no_sys_mt_conv = {
"conversations": [
{"from": "human", "value": "abc"},
{"from": "gpt", "value": "ipsum"},
{"from": "human", "value": "123"},
{"from": "gpt", "value": "sit"},
]
}
ns_mt_ids = tokenize(no_sys_mt_conv)
ns_mt_ids = tokenize(test_data['multi_turn_no_sys'])
assert decode(ns_mt_ids) == '<s> [INST] abc [/INST] ipsum</s><s> [INST] 123 [/INST] sit</s>'
assert ns_mt_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2, 1, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
# fmt: on
def test_sharegpt_mistral(self):
"Make sure the sharegpt/mistral is tokenized and formatted correctly."
strat = prompt_strat("mistral", self.tokenizer)
def tokenize(conv):
return strat.tokenize_prompt(deepcopy(conv))["input_ids"]
def decode(ids):
return strat.tokenizer.decode(ids)
# fmt: off
# System message, multi-turn conversations
mt_ids = tokenize(test_data['multi_turn_sys'])
assert decode(mt_ids) == '<s> [INST] lorem\nabc [/INST] ipsum</s> [INST] 123 [/INST] sit</s>'
assert mt_ids == [1, 518, 25580, 29962, 301, 3668, 13, 10736, 518, 29914, 25580, 29962, 23421, 2, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
# System message, single-turn conversations
st_ids = tokenize(test_data['single_turn_sys'])
assert decode(st_ids) == '<s> [INST] lorem\nabc [/INST] ipsum</s>'
assert st_ids == [1, 518, 25580, 29962, 301, 3668, 13, 10736, 518, 29914, 25580, 29962, 23421, 2]
# No system message, single-turn
ns_ids = tokenize(test_data['single_turn_no_sys'])
assert decode(ns_ids) == '<s> [INST] abc [/INST] ipsum</s>'
assert ns_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2]
# No system message, multi-turn
ns_mt_ids = tokenize(test_data['multi_turn_no_sys'])
assert decode(ns_mt_ids) == '<s> [INST] abc [/INST] ipsum</s> [INST] 123 [/INST] sit</s>'
assert ns_mt_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
# fmt: on
def test_sharegpt_changes_roles(self):
conversation = {
"roles": ["USER", "CHARACTER"],

View File

@@ -3,6 +3,8 @@ Test cases for the tokenizer loading
"""
import unittest
import pytest
from axolotl.utils.dict import DictDefault
from axolotl.utils.models import load_tokenizer
@@ -31,6 +33,40 @@ class TestTokenizers(unittest.TestCase):
tokenizer = load_tokenizer(cfg)
assert "Fast" not in tokenizer.__class__.__name__
def test_special_tokens_modules_to_save(self):
# setting special_tokens to new token
cfg = DictDefault(
{
"tokenizer_config": "huggyllama/llama-7b",
"adapter": "lora",
"special_tokens": {"bos_token": "[INST]"},
}
)
with pytest.raises(
ValueError,
match=r".*Please set lora_modules_to_save*",
):
load_tokenizer(cfg)
# setting special_tokens but not changing from default
cfg = DictDefault(
{
"tokenizer_config": "huggyllama/llama-7b",
"adapter": "lora",
"special_tokens": {"bos_token": "<s>"},
}
)
load_tokenizer(cfg)
# non-adapter setting special_tokens
cfg = DictDefault(
{
"tokenizer_config": "huggyllama/llama-7b",
"special_tokens": {"bos_token": "[INST]"},
}
)
load_tokenizer(cfg)
if __name__ == "__main__":
unittest.main()

View File

@@ -682,6 +682,43 @@ class ValidationTest(unittest.TestCase):
validate_config(cfg)
def test_add_tokens_adapter(self):
cfg = DictDefault(
{"adapter": "qlora", "load_in_4bit": True, "tokens": ["<|imstart|>"]}
)
with pytest.raises(
ValueError,
match=r".*lora_modules_to_save not properly set yet adding new tokens*",
):
validate_config(cfg)
cfg = DictDefault(
{
"adapter": "qlora",
"load_in_4bit": True,
"tokens": ["<|imstart|>"],
"lora_modules_to_save": ["embed_tokens"],
}
)
with pytest.raises(
ValueError,
match=r".*lora_modules_to_save not properly set yet adding new tokens*",
):
validate_config(cfg)
cfg = DictDefault(
{
"adapter": "qlora",
"load_in_4bit": True,
"tokens": ["<|imstart|>"],
"lora_modules_to_save": ["embed_tokens", "lm_head"],
}
)
validate_config(cfg)
class ValidationWandbTest(ValidationTest):
"""