Compare commits

..

27 Commits

Author SHA1 Message Date
Wing Lian
e9c3a2aec0 add missing dunder-init for monkeypatches and add tests for install from sdist (#2085)
Some checks failed
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 121, 12.1.1, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
* add missing dunder-init for monkeypatches and add tests for install from sdist

* fix gha name

* reduce matrix for sdist test
2024-11-19 12:43:30 -05:00
Wing Lian
02ca3f93b0 set manifest and fix for source dist (#2084)
Some checks failed
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 121, 12.1.1, 3.11, 2.3.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
2024-11-19 11:31:56 -05:00
Wing Lian
5f6f9186e4 make sure action has permission to create release (#2083) [skip ci] 2024-11-19 10:43:02 -05:00
Wing Lian
6679e20f47 release version 0.5.1 (#2082) 2024-11-19 10:35:59 -05:00
Wing Lian
ec59d4cb83 remove deprecated extra metadata kwarg from pydantic Field (#2081) [skip ci] 2024-11-19 10:30:10 -05:00
Wing Lian
a77c8a71cf fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] (#2080) [skip ci] 2024-11-19 10:29:31 -05:00
Wing Lian
775311f98f add optimizer step to prevent warning in tests (#1502) [skip ci]
* add optimizer step to prevent warning in tests

* add optimizer step to warmup as well
2024-11-19 10:19:03 -05:00
NanoCode012
f007c38e49 Feat: Drop long samples and shuffle rl samples (#2040) [skip ci]
* feat: LOG warn if samples are dropped due to seq length

* feat: add drop long samples for RL

* feat: add ipo

* fix: remove num_proc for map as subprocesses are prone to die

* feat: shuffle rl dataset

* fix: support preprocess for kto

* chore: use set instead of list

* feat: add simpo
2024-11-19 10:18:24 -05:00
Wing Lian
d9b71edf84 bump transformers for fsdp-grad-accum fix, remove patch (#2079) 2024-11-19 02:23:09 -05:00
Wing Lian
c07bd2fa65 Readme updates v2 (#2078)
* update readme logos

* use full logo

* Fix svgs

* add srcset

* resize svgs to match

* Rename file

* align badges center
2024-11-18 14:58:03 -05:00
Wing Lian
ed079d434a static assets, readme, and badges update v1 (#2077) 2024-11-18 13:59:32 -05:00
Wing Lian
8403c67156 don't build bdist (#2076) [skip ci] 2024-11-18 12:36:03 -05:00
Wing Lian
9871fa060b optim e2e tests to run a bit faster (#2069) [skip ci]
* optim e2e tests to run a bit faster

* run prequant w/o lora_modules_to_save

* use smollm2
2024-11-18 12:35:31 -05:00
Wing Lian
70cf79ef52 upgrade autoawq==0.2.7.post2 for transformers fix (#2070)
* point to upstream autoawq for transformers fix

* use autoawq 0.2.7 release

* test wheel for awq

* try different format for wheel def

* autoawq re-release

* Add intel_extension_for_pytorch dep

* ipex gte version

* forcefully remove intel-extension-for-pytorch

* add -y option to pip uninstall for ipex

* use post2 release for autoawq and remove uninstall of ipex
2024-11-18 11:53:37 -05:00
Wing Lian
c06b8f0243 increase worker count to 8 for basic pytests (#2075) [skip ci] 2024-11-18 11:52:35 -05:00
Chirag Jain
0c8b1d824a Update get_unpad_data patching for multipack (#2013)
* Update `get_unpad_data` patching for multipack

* Update src/axolotl/utils/models.py

* Update src/axolotl/utils/models.py

* Add test case

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-11-15 20:35:50 -05:00
NanoCode012
fd70eec577 fix: loading locally downloaded dataset (#2056) [skip ci] 2024-11-15 20:35:26 -05:00
Wing Lian
d42f202046 Fsdp grad accum monkeypatch (#2064) 2024-11-15 19:11:04 -05:00
Wing Lian
0dabde1962 support for schedule free and e2e ci smoke test (#2066) [skip ci]
* support for schedule free and e2e ci smoke test

* set default lr scheduler to constant in test

* ignore duplicate code

* fix quotes for config/dict
2024-11-15 19:10:14 -05:00
Wing Lian
15f1462ccd support passing trust_remote_code to dataset loading (#2050) [skip ci]
* support passing trust_remote_code to dataset loading

* add doc for trust_remote_code in dataset config
2024-11-15 19:09:48 -05:00
Wing Lian
521e62daf1 remove the bos token from dpo outputs (#1733) [skip ci]
* remove the bos token from dpo outputs

* don't forget to fix prompt_input_ids too

* use processing_class instead of tokenizer

* fix for processing class
2024-11-15 19:09:20 -05:00
Wing Lian
c16ec398d7 update to be deprecated evaluation_strategy (#1682) [skip ci]
* update to be deprecated evaluation_strategy and c4 dataset

* chore: lint

* remap eval strategy to new config and add tests
2024-11-15 19:09:00 -05:00
Wing Lian
2f20cb7ebf upgrade datasets==3.1.0 and add upstream check (#2067) [skip ci] 2024-11-15 19:08:38 -05:00
Wing Lian
71d4030b79 gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059)
* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency
2024-11-14 12:59:00 -05:00
Wing Lian
f3a5d119af fix env var extraction (#2043) [skip ci] 2024-11-14 12:58:06 -05:00
Wing Lian
ba219b51a5 fix duplicate base build (#2061) [skip ci] 2024-11-14 10:31:19 -05:00
Wing Lian
5be8e13d35 make sure to add tags for versioned tag on cloud docker images (#2060) 2024-11-14 10:24:49 -05:00
50 changed files with 961 additions and 666 deletions

View File

@@ -27,7 +27,7 @@ jobs:
- cuda: "124" - cuda: "124"
cuda_version: 12.4.1 cuda_version: 12.4.1
cudnn_version: "" cudnn_version: ""
python_version: "3.11" python_version: "3.10"
pytorch: 2.4.1 pytorch: 2.4.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX" torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
- cuda: "124" - cuda: "124"

View File

@@ -10,7 +10,7 @@ on:
jobs: jobs:
build-axolotl: build-axolotl:
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
@@ -77,7 +77,7 @@ jobs:
build-axolotl-cloud: build-axolotl-cloud:
needs: build-axolotl needs: build-axolotl
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
# this job needs to be run on self-hosted GPU runners... # this job needs to be run on self-hosted GPU runners...
strategy: strategy:
matrix: matrix:
@@ -114,6 +114,9 @@ jobs:
images: | images: |
winglian/axolotl-cloud winglian/axolotl-cloud
axolotlai/axolotl-cloud axolotlai/axolotl-cloud
tags: |
type=ref,event=branch
type=semver,pattern={{version}}
- name: Login to Docker Hub - name: Login to Docker Hub
uses: docker/login-action@v3 uses: docker/login-action@v3
with: with:
@@ -137,7 +140,7 @@ jobs:
build-axolotl-cloud-no-tmux: build-axolotl-cloud-no-tmux:
needs: build-axolotl needs: build-axolotl
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
# this job needs to be run on self-hosted GPU runners... # this job needs to be run on self-hosted GPU runners...
strategy: strategy:
matrix: matrix:

View File

@@ -8,9 +8,14 @@ on:
schedule: schedule:
- cron: '0 0 * * 1,4' # Runs at 00:00 UTC every monday & thursday - cron: '0 0 * * 1,4' # Runs at 00:00 UTC every monday & thursday
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
jobs: jobs:
test-axolotl-multigpu: test-axolotl-multigpu:
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' }}
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:

View File

@@ -7,7 +7,7 @@ on:
jobs: jobs:
build-axolotl: build-axolotl:
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
@@ -71,7 +71,7 @@ jobs:
build-axolotl-cloud: build-axolotl-cloud:
needs: build-axolotl needs: build-axolotl
if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }} if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
# this job needs to be run on self-hosted GPU runners... # this job needs to be run on self-hosted GPU runners...
strategy: strategy:
matrix: matrix:

View File

@@ -10,6 +10,8 @@ jobs:
setup_release: setup_release:
name: Create Release name: Create Release
runs-on: ubuntu-latest runs-on: ubuntu-latest
permissions:
contents: write
steps: steps:
- name: Get the tag version - name: Get the tag version
id: extract_branch id: extract_branch
@@ -56,9 +58,9 @@ jobs:
run: | run: |
sed -i -E 's/version="([0-9.]+)",/version="${{ steps.tag.outputs.TAG_NAME }}",/g' setup.py sed -i -E 's/version="([0-9.]+)",/version="${{ steps.tag.outputs.TAG_NAME }}",/g' setup.py
- name: Build a binary wheel - name: Build a source dist
run: | run: |
python setup.py sdist bdist_wheel python setup.py sdist
- name: Publish package distributions to PyPI - name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1 uses: pypa/gh-action-pypi-publish@release/v1

View File

@@ -48,6 +48,7 @@ jobs:
sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt
sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt
sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt
sed -i 's#^datasets.*#datasets @ git+https://github.com/huggingface/datasets.git@main#' requirements.txt
- name: Install dependencies - name: Install dependencies
run: | run: |

View File

@@ -71,18 +71,62 @@ jobs:
- name: Run tests - name: Run tests
run: | run: |
pytest --ignore=tests/e2e/ tests/ pytest -n8 --ignore=tests/e2e/ tests/
- name: cleanup pip cache
run: |
find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;
pytest-sdist:
name: PyTest from Source Dist
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python_version: ["3.11"]
pytorch_version: ["2.4.1", "2.5.1"]
timeout-minutes: 20
steps:
- name: Check out repository code
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python_version }}
cache: 'pip' # caching pip dependencies
- name: upgrade pip
run: |
pip3 install --upgrade pip
pip3 install --upgrade packaging setuptools wheel
- name: Install PyTorch
run: |
pip3 install torch==${{ matrix.pytorch_version }}
- name: Install dependencies
run: |
pip3 show torch
python3 setup.py sdist
pip3 install dist/axolotl*.tar.gz
pip3 install -r requirements-dev.txt -r requirements-tests.txt
- name: Run tests
run: |
pytest -n8 --ignore=tests/e2e/ tests/
- name: cleanup pip cache - name: cleanup pip cache
run: | run: |
find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \; find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;
docker-e2e-tests-1st: docker-e2e-tests-1st:
if: github.repository_owner == 'axolotl-ai-cloud' if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' }}
# this job needs to be run on self-hosted GPU runners... # this job needs to be run on self-hosted GPU runners...
runs-on: [self-hosted, modal] runs-on: [self-hosted, modal]
timeout-minutes: 90 timeout-minutes: 90
needs: [pre-commit, pytest] needs: [pre-commit, pytest, pytest-sdist]
strategy: strategy:
fail-fast: false fail-fast: false

4
MANIFEST.in Normal file
View File

@@ -0,0 +1,4 @@
include requirements.txt
include README.md
include LICENSE
recursive-include axolotl *.py

View File

@@ -1,8 +1,21 @@
# Axolotl <p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="image/axolotl_logo_digital_white.svg">
<source media="(prefers-color-scheme: light)" srcset="image/axolotl_logo_digital_black.svg">
<img alt="Axolotl" src="image/axolotl_logo_digital_black.svg" width="400" height="104" style="max-width: 100%;">
</picture>
</p>
![tests](https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests.yml/badge.svg) <p align="center">
![tests-nightly](https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests-nightly.yml/badge.svg) <img src="https://img.shields.io/github/license/axolotl-ai-cloud/axolotl.svg?color=blue" alt="GitHub License">
![multigpu-semi-weekly tests](https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/multi-gpu-e2e.yml/badge.svg) <img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests.yml/badge.svg" alt="tests">
<a href="https://github.com/axolotl-ai-cloud/axolotl/releases"><img src="https://img.shields.io/github/release/axolotl-ai-cloud/axolotl.svg" alt="Releases"></a>
<img src="https://img.shields.io/github/stars/axolotl-ai-cloud/axolotl" alt="GitHub Repo stars">
</p>
<p align="center">
<img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests-nightly.yml/badge.svg" alt="tests-nightly">
<img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/multi-gpu-e2e.yml/badge.svg" alt="multigpu-semi-weekly tests">
</p>
Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures. Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.
@@ -75,7 +88,7 @@ Features:
<td> <td>
<div align="center"> <div align="center">
<img src="image/axolotl.png" alt="axolotl" width="160"> <img src="image/axolotl_symbol_digital_white.svg" alt="axolotl" width="160">
<div> <div>
<p> <p>
<b>Axolotl provides a unified repository for fine-tuning <br />a variety of AI models with ease</b> <b>Axolotl provides a unified repository for fine-tuning <br />a variety of AI models with ease</b>

View File

@@ -28,6 +28,7 @@ RUN if [ "$NIGHTLY_BUILD" = "true" ] ; then \
sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt; \ sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt; \
sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt; \ sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt; \
sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt; \ sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt; \
sed -i 's#^datasets.*#datasets @ git+https://github.com/huggingface/datasets.git@main#' requirements.txt; \
fi fi
RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \ RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \

View File

@@ -1,6 +1,6 @@
#!/bin/bash #!/bin/bash
set -e set -e
pytest -n4 --ignore=tests/e2e/ /workspace/axolotl/tests/ pytest -n8 --ignore=tests/e2e/ /workspace/axolotl/tests/
pytest -n1 --dist loadfile -v /workspace/axolotl/tests/e2e/patched/ /workspace/axolotl/tests/e2e/integrations/ pytest -n1 --dist loadfile -v /workspace/axolotl/tests/e2e/patched/ /workspace/axolotl/tests/e2e/integrations/
pytest --ignore=tests/e2e/patched/ --ignore=tests/e2e/multigpu/ --ignore=tests/e2e/integrations/ /workspace/axolotl/tests/e2e/ pytest --ignore=tests/e2e/patched/ --ignore=tests/e2e/multigpu/ --ignore=tests/e2e/integrations/ /workspace/axolotl/tests/e2e/

View File

@@ -2,4 +2,4 @@
set -e set -e
# only run one test at a time so as not to OOM the GPU # only run one test at a time so as not to OOM the GPU
pytest -n1 /workspace/axolotl/tests/e2e/multigpu/ pytest -v -n2 /workspace/axolotl/tests/e2e/multigpu/

View File

@@ -91,6 +91,7 @@ datasets:
name: # Optional[str] name of dataset configuration to load name: # Optional[str] name of dataset configuration to load
train_on_split: train # Optional[str] name of dataset split to load from train_on_split: train # Optional[str] name of dataset split to load from
revision: # Optional[str] The specific revision of the dataset to use when loading from the Hugging Face Hub. This can be a commit hash, tag, or branch name. If not specified, the latest version will be used. This parameter is ignored for local datasets. revision: # Optional[str] The specific revision of the dataset to use when loading from the Hugging Face Hub. This can be a commit hash, tag, or branch name. If not specified, the latest version will be used. This parameter is ignored for local datasets.
trust_remote_code: # Optional[bool] Trust remote code for untrusted source
# Custom user instruction prompt # Custom user instruction prompt
- path: repo - path: repo

67
examples/qwen2/dpo.yaml Normal file
View File

@@ -0,0 +1,67 @@
base_model: Qwen/Qwen2.5-0.5B
strict: false
chat_template: qwen_25
rl: dpo
datasets:
- path: fozziethebeat/alpaca_messages_2k_dpo_test
type: chat_template.default
field_messages: conversation
field_chosen: chosen
field_rejected: rejected
message_field_role: role
message_field_content: content
roles:
system:
- system
user:
- user
assistant:
- assistant
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./outputs/dpo-out
sequence_len: 2048
sample_packing: false
pad_to_sequence_len: true
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 1113 283.5">
<path fill="#141310" d="M435,234.3l-12.1-48.8h-54.4l-12.1,48.8h-24.7l48.2-185.1h31.6l47.9,185.1h-24.5ZM417.7,164.9l-13.8-55.6c-2.7-10.7-4.8-19.7-6.3-26.9-.9-4.2-1.5-7.5-2-9.9-.5,2.5-1.2,5.8-2,9.9-1.5,7.1-3.6,16.1-6.3,26.7l-13.8,55.9h44.3Z"/>
<path fill="#141310" d="M568.2,234.3l-29.9-45.6c-1.2-1.9-2.4-4.1-3.5-6.5-.8-1.7-1.5-3.3-2.1-4.5-.6,1.3-1.4,2.8-2.3,4.5-1.3,2.4-2.6,4.6-4,6.5l-29.9,45.6h-28.5l49.6-71.9-46.5-67.9h28.5l27.6,43.1c1.2,1.9,2.3,3.9,3.4,6.1.7,1.4,1.4,2.7,1.9,3.8.5-1.1,1.1-2.4,1.8-3.8,1.1-2.2,2.2-4.2,3.4-6.1l27.6-43.1h28.5l-46.5,68.2,49.3,71.7h-28.5Z"/>
<path fill="#141310" d="M658.6,236.3c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM658.6,114.1c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path fill="#141310" d="M860.6,236.3c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM860.6,114.1c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path fill="#141310" d="M773.9,234c-18,0-32.6-14.6-32.6-32.6V48.8h24.1v152.6c0,4.7,3.8,8.5,8.5,8.5h16.8v24.1h-16.8Z"/>
<path fill="#141310" d="M1036.2,234.3V81.4c0-4.7-3.8-8.5-8.5-8.5h-16.8v-24.1h16.8c18,0,32.6,14.6,32.6,32.6v152.9h-24.1Z"/>
<path fill="#141310" d="M978.6,234.3c-18,0-32.6-14.6-32.6-32.6v-85.1h-20.3v-22.1h20.3v-45.3h24.1v45.3h30.2v22.1h-30.2v85.1c0,4.7,3.8,8.5,8.5,8.5h21.7v24.1h-21.7Z"/>
<path fill="#141310" d="M51.5,49h12.2v-20.6h-12.2c-16,0-29,13-29,29v32.8h20.6v-32.8c0-4.7,3.8-8.4,8.4-8.4Z"/>
<path fill="#141310" d="M92.8,49h12.2v-20.6h-12.2c-16,0-29,13-29,29v12.2h20.6v-12.2c0-4.7,3.8-8.4,8.4-8.4Z"/>
<path fill="#141310" d="M249.3,57.4c0-16-13-29-29-29h-12.2v20.6h12.2c4.7,0,8.4,3.8,8.4,8.4v32.8h20.6v-32.8Z"/>
<path fill="#141310" d="M187.4,90.2v-20.6h-103.1v20.6h-41.2v20.6h-20.6v41.2c0,11.4,9.2,20.6,20.6,20.6h185.5c11.4,0,20.6-9.2,20.6-20.6v-41.2h-20.6v-20.6h-41.2ZM166.8,141.7c0-5.7-4.6-10.3-10.3-10.3s-10.3,4.6-10.3,10.3v10.3h-20.6v-20.6c0-11.4,9.2-20.6,20.6-20.6s20.6,9.2,20.6,20.6v10.3ZM228.7,141.7c0-5.7-4.6-10.3-10.3-10.3s-10.3,4.6-10.3,10.3v10.3h-20.6v-20.6c0-11.4,9.2-20.6,20.6-20.6s20.6,9.2,20.6,20.6v10.3Z"/>
<path fill="#141310" d="M208,57.4c0-16-13-29-29-29h-12.2v20.6h12.2c4.7,0,8.4,3.8,8.4,8.4v12.2h20.6v-12.2Z"/>
<rect fill="#141310" x="22.5" y="234.5" width="41.2" height="20.6"/>
<rect fill="#141310" x="84.3" y="234.5" width="164.9" height="20.6"/>
<rect fill="#141310" x="208" y="193.3" width="41.2" height="20.6"/>
<rect fill="#141310" x="22.5" y="193.3" width="164.9" height="20.6"/>
</svg>

After

Width:  |  Height:  |  Size: 3.2 KiB

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 1113 283.5">
<path fill="#fff" d="M462.9,234.2l-12.1-48.8h-54.4l-12.1,48.8h-24.7l48.2-185h31.6l47.9,185h-24.4ZM445.7,164.8l-13.8-55.6c-2.7-10.7-4.8-19.7-6.3-26.9-.9-4.2-1.5-7.5-2-9.9-.5,2.5-1.2,5.8-2,9.9-1.5,7.1-3.6,16.1-6.3,26.7l-13.8,55.9h44.3Z"/>
<path fill="#fff" d="M596.1,234.2l-29.9-45.6c-1.2-1.9-2.4-4.1-3.5-6.5-.8-1.7-1.5-3.3-2.1-4.5-.6,1.3-1.4,2.8-2.3,4.5-1.3,2.4-2.6,4.6-4,6.5l-29.9,45.6h-28.5l49.5-71.9-46.5-67.9h28.5l27.6,43.1c1.2,1.9,2.3,3.9,3.4,6.1.7,1.4,1.3,2.7,1.9,3.8.5-1.1,1.1-2.4,1.8-3.8,1.1-2.2,2.2-4.2,3.4-6.1l27.6-43.1h28.5l-46.5,68.1,49.3,71.6h-28.5Z"/>
<path fill="#fff" d="M686.4,236.2c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.6,14.8-41.4,9.8-9.7,23.4-14.7,40.2-14.7s30.4,4.9,40.2,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.4-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM686.4,114.1c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.8v-36.7c0-10.5-2.8-18.5-8.2-23.8-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path fill="#fff" d="M888.3,236.2c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.6,14.8-41.4,9.8-9.7,23.4-14.7,40.2-14.7s30.4,4.9,40.2,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.4-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM888.3,114.1c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.8v-36.7c0-10.5-2.8-18.5-8.2-23.8-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path fill="#fff" d="M801.7,234c-18,0-32.6-14.6-32.6-32.6V48.8h24.1v152.5c0,4.7,3.8,8.5,8.5,8.5h16.7v24.1h-16.7Z"/>
<path fill="#fff" d="M1063.8,234.2V81.4c0-4.7-3.8-8.5-8.5-8.5h-16.7v-24.1h16.7c18,0,32.6,14.6,32.6,32.6v152.8h-24.1Z"/>
<path fill="#fff" d="M1006.2,234.2c-18,0-32.6-14.6-32.6-32.6v-85h-20.3v-22.1h20.3v-45.2h24.1v45.2h30.2v22.1h-30.2v85c0,4.7,3.8,8.5,8.5,8.5h21.7v24.1h-21.7Z"/>
<path fill="#fff" d="M160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM277.3,57.4c0-23.8-19.3-43.1-43.1-43.1h-12.2c-3.9,0-7.6,1.6-10.2,4.4-5.9-2.9-12.3-4.4-18.9-4.4h-12.2c-7.7,0-14.1,6.3-14.1,14.1v20.6c0,2.4.6,4.6,1.6,6.6h-37c1-2,1.6-4.2,1.6-6.6v-20.6c0-7.7-6.3-14.1-14.1-14.1h-12.2c-6.5,0-13,1.5-18.9,4.4-2.6-2.8-6.3-4.4-10.2-4.4h-12.2c-23.8,0-43.1,19.3-43.1,43.1v32.8c0,4.1,1.7,7.7,4.5,10.3-2.8,2.6-4.5,6.2-4.5,10.3v41.2c0,11,5.2,20.8,13.2,27.2-7.3.4-13.2,6.6-13.2,14v20.6c0,4.1,1.7,7.7,4.5,10.3-2.8,2.6-4.5,6.2-4.5,10.3v20.6c0,7.7,6.3,14.1,14.1,14.1h41.2c4.1,0,7.7-1.7,10.3-4.5,2.6,2.8,6.2,4.5,10.3,4.5h164.9c7.7,0,14.1-6.3,14.1-14.1v-20.6c0-4.1-1.7-7.7-4.5-10.3,2.8-2.6,4.5-6.2,4.5-10.3v-20.6c0-7.5-5.8-13.6-13.2-14,8-6.4,13.2-16.2,13.2-27.2v-41.2c0-4.1-1.7-7.7-4.5-10.3,2.8-2.6,4.5-6.2,4.5-10.3v-32.8ZM77.8,255.1h-41.2v-20.6h41.2v20.6ZM36.5,213.9v-20.6h164.9v20.6H36.5ZM263.3,255.1H98.4v-20.6h164.9v20.6ZM263.3,213.9h-41.2v-20.6h41.2v20.6ZM263.3,90.2h-20.6v20.6h20.6v41.2c0,11.4-9.2,20.6-20.6,20.6H57.2c-11.4,0-20.6-9.2-20.6-20.6v-41.2h20.6v-20.6h-20.6v-32.8c0-16,13-29,29-29h12.2v20.6h-12.2c-4.7,0-8.4,3.8-8.4,8.4v32.8h41.2v-20.6h-20.6v-12.2c0-16,13-29,29-29h12.2v20.6h-12.2c-4.7,0-8.4,3.8-8.4,8.4v12.2h103.1v-12.2c0-4.7-3.8-8.4-8.4-8.4h-12.2v-20.6h12.2c16,0,29,13,29,29v12.2h-20.6v20.6h41.2v-32.8c0-4.7-3.8-8.4-8.4-8.4h-12.2v-20.6h12.2c16,0,29,13,29,29v32.8ZM201.4,152h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6s-20.6,9.2-20.6,20.6v20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM222,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM160.2,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6Z"/>
</svg>

After

Width:  |  Height:  |  Size: 6.6 KiB

View File

@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 283.5 283.5">
<defs>
<style>
.cls-1 {
fill: #141310;
}
</style>
</defs>
<!-- Generator: Adobe Illustrator 28.7.1, SVG Export Plug-In . SVG Version: 1.2.0 Build 142) -->
<g>
<g id="Layer_1">
<g>
<path class="cls-1" d="M46.9,37.4h13.7V14.2h-13.7c-18,0-32.7,14.6-32.7,32.7v36.9h23.2v-36.9c0-5.2,4.2-9.5,9.5-9.5Z"/>
<path class="cls-1" d="M93.2,37.4h13.7V14.2h-13.7c-18,0-32.7,14.6-32.7,32.7v13.7h23.2v-13.7c0-5.2,4.2-9.5,9.5-9.5Z"/>
<path class="cls-1" d="M269.3,46.9c0-18-14.6-32.7-32.7-32.7h-13.7v23.2h13.7c5.2,0,9.5,4.2,9.5,9.5v36.9h23.2v-36.9Z"/>
<path class="cls-1" d="M199.7,83.8v-23.2h-116v23.2h-46.4v23.2H14.2v46.4c0,12.8,10.4,23.2,23.2,23.2h208.7c12.8,0,23.2-10.4,23.2-23.2v-46.4h-23.2v-23.2h-46.4ZM176.5,141.7c0-6.4-5.2-11.6-11.6-11.6s-11.6,5.2-11.6,11.6v11.6h-23.2v-23.2c0-12.8,10.4-23.2,23.2-23.2s23.2,10.4,23.2,23.2v11.6ZM246.1,141.7c0-6.4-5.2-11.6-11.6-11.6s-11.6,5.2-11.6,11.6v11.6h-23.2v-23.2c0-12.8,10.4-23.2,23.2-23.2s23.2,10.4,23.2,23.2v11.6Z"/>
<path class="cls-1" d="M222.9,46.9c0-18-14.6-32.7-32.7-32.7h-13.7v23.2h13.7c5.2,0,9.5,4.2,9.5,9.5v13.7h23.2v-13.7Z"/>
<rect class="cls-1" x="14.2" y="246.1" width="46.4" height="23.2"/>
<rect class="cls-1" x="83.8" y="246.1" width="185.5" height="23.2"/>
<rect class="cls-1" x="222.9" y="199.7" width="46.4" height="23.2"/>
<rect class="cls-1" x="14.2" y="199.7" width="185.5" height="23.2"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 1.6 KiB

View File

@@ -0,0 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 283.5 283.5">
<defs>
<style>
.cls-1 {
fill: #fff;
}
</style>
</defs>
<!-- Generator: Adobe Illustrator 28.7.1, SVG Export Plug-In . SVG Version: 1.2.0 Build 142) -->
<g>
<g id="Layer_1">
<path class="cls-1" d="M152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM269.3,57.3c0-23.8-19.4-43.1-43.1-43.1h-12.2c-3.9,0-7.6,1.6-10.2,4.4-5.9-2.9-12.3-4.4-18.9-4.4h-12.2c-7.8,0-14.1,6.3-14.1,14.1v20.6c0,2.4.6,4.6,1.6,6.6h-37c1-2,1.6-4.2,1.6-6.6v-20.6c0-7.8-6.3-14.1-14.1-14.1h-12.2c-6.6,0-13,1.5-18.9,4.4-2.6-2.8-6.3-4.4-10.2-4.4h-12.2c-23.8,0-43.1,19.4-43.1,43.1v32.8c0,4.1,1.7,7.7,4.5,10.3-2.8,2.6-4.5,6.2-4.5,10.3v41.3c0,11,5.2,20.9,13.2,27.2-7.4.4-13.2,6.6-13.2,14v20.6c0,4.1,1.7,7.7,4.5,10.3-2.8,2.6-4.5,6.2-4.5,10.3v20.6c0,7.8,6.3,14.1,14.1,14.1h41.3c4.1,0,7.7-1.7,10.3-4.5,2.6,2.8,6.2,4.5,10.3,4.5h165.1c7.8,0,14.1-6.3,14.1-14.1v-20.6c0-4.1-1.7-7.7-4.5-10.3,2.8-2.6,4.5-6.2,4.5-10.3v-20.6c0-7.5-5.9-13.6-13.2-14,8-6.4,13.2-16.2,13.2-27.2v-41.3c0-4.1-1.7-7.7-4.5-10.3,2.8-2.6,4.5-6.2,4.5-10.3v-32.8ZM69.5,255.2H28.2v-20.6h41.3v20.6ZM28.2,214v-20.6h165.1v20.6H28.2ZM255.2,255.2H90.1v-20.6h165.1v20.6ZM255.2,214h-41.3v-20.6h41.3v20.6ZM255.2,90.1h-20.6v20.6h20.6v41.3c0,11.4-9.2,20.6-20.6,20.6H48.9c-11.4,0-20.6-9.2-20.6-20.6v-41.3h20.6v-20.6h-20.6v-32.8c0-16.1,13-29.1,29.1-29.1h12.2v20.6h-12.2c-4.7,0-8.4,3.8-8.4,8.4v32.8h41.3v-20.6h-20.6v-12.2c0-16.1,13-29.1,29.1-29.1h12.2v20.6h-12.2c-4.7,0-8.4,3.8-8.4,8.4v12.2h103.2v-12.2c0-4.7-3.8-8.4-8.4-8.4h-12.2v-20.6h12.2c16.1,0,29.1,13,29.1,29.1v12.2h-20.6v20.6h41.3v-32.8c0-4.7-3.8-8.4-8.4-8.4h-12.2v-20.6h12.2c16.1,0,29.1,13,29.1,29.1v32.8ZM193.3,152h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6s-20.6,9.2-20.6,20.6v20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM214,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6ZM152,110.8c-11.4,0-20.6,9.2-20.6,20.6v20.6h20.6v-10.3c0-5.7,4.6-10.3,10.3-10.3s10.3,4.6,10.3,10.3v-10.3c0-11.4-9.2-20.6-20.6-20.6Z"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 5.0 KiB

View File

@@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 765.4 212.6">
<!-- Generator: Adobe Illustrator 28.7.1, SVG Export Plug-In . SVG Version: 1.2.0 Build 142) -->
<g>
<g id="Layer_1">
<g>
<path d="M121.6,198.1l-12.1-48.8h-54.4l-12.1,48.8h-24.7L66.6,12.9h31.6l47.9,185.1h-24.5ZM104.4,128.6l-13.8-55.6c-2.7-10.7-4.8-19.7-6.3-26.9-.9-4.2-1.5-7.5-2-9.9-.5,2.5-1.2,5.8-2,9.9-1.5,7.1-3.6,16.1-6.3,26.7l-13.8,55.9h44.3Z"/>
<path d="M254.9,198.1l-29.9-45.6c-1.2-1.9-2.4-4.1-3.5-6.5-.8-1.7-1.5-3.3-2.1-4.5-.6,1.3-1.4,2.8-2.3,4.5-1.3,2.4-2.6,4.6-4,6.5l-29.9,45.6h-28.5l49.6-71.9-46.5-67.9h28.5l27.6,43.1c1.2,1.9,2.3,3.9,3.4,6.1.7,1.4,1.4,2.7,1.9,3.8.5-1.1,1.1-2.4,1.8-3.8,1.1-2.2,2.2-4.2,3.4-6.1l27.6-43.1h28.5l-46.5,68.2,49.3,71.7h-28.5Z"/>
<path d="M345.2,200.1c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM345.2,77.8c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path d="M547.3,200.1c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM547.3,77.8c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path d="M460.6,197.8c-18,0-32.6-14.6-32.6-32.6V12.5h24.1v152.6c0,4.7,3.8,8.5,8.5,8.5h16.8v24.1h-16.8Z"/>
<path d="M722.8,198.1V45.2c0-4.7-3.8-8.5-8.5-8.5h-16.8V12.5h16.8c18,0,32.6,14.6,32.6,32.6v152.9h-24.1Z"/>
<path d="M665.2,198.1c-18,0-32.6-14.6-32.6-32.6v-85.1h-20.3v-22.1h20.3V12.9h24.1v45.3h30.2v22.1h-30.2v85.1c0,4.7,3.8,8.5,8.5,8.5h21.7v24.1h-21.7Z"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.1 KiB

View File

@@ -0,0 +1,24 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 765.4 212.6">
<defs>
<style>
.cls-1 {
fill: #fff;
}
</style>
</defs>
<!-- Generator: Adobe Illustrator 28.7.1, SVG Export Plug-In . SVG Version: 1.2.0 Build 142) -->
<g>
<g id="Layer_1">
<g>
<path class="cls-1" d="M121.6,198.1l-12.1-48.8h-54.4l-12.1,48.8h-24.7L66.6,12.9h31.6l47.9,185.1h-24.5ZM104.4,128.6l-13.8-55.6c-2.7-10.7-4.8-19.7-6.3-26.9-.9-4.2-1.5-7.5-2-9.9-.5,2.5-1.2,5.8-2,9.9-1.5,7.1-3.6,16.1-6.3,26.7l-13.8,55.9h44.3Z"/>
<path class="cls-1" d="M254.9,198.1l-29.9-45.6c-1.2-1.9-2.4-4.1-3.5-6.5-.8-1.7-1.5-3.3-2.1-4.5-.6,1.3-1.4,2.8-2.3,4.5-1.3,2.4-2.6,4.6-4,6.5l-29.9,45.6h-28.5l49.6-71.9-46.5-67.9h28.5l27.6,43.1c1.2,1.9,2.3,3.9,3.4,6.1.7,1.4,1.4,2.7,1.9,3.8.5-1.1,1.1-2.4,1.8-3.8,1.1-2.2,2.2-4.2,3.4-6.1l27.6-43.1h28.5l-46.5,68.2,49.3,71.7h-28.5Z"/>
<path class="cls-1" d="M345.2,200.1c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM345.2,77.8c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path class="cls-1" d="M547.3,200.1c-16.7,0-30.2-5-40.1-14.8-9.9-9.8-14.9-23.7-14.9-41.3v-31.7c0-17.7,5-31.7,14.8-41.4,9.8-9.7,23.4-14.7,40.3-14.7s30.4,4.9,40.3,14.7c9.8,9.7,14.8,23.7,14.8,41.4v31.7c0,17.6-5,31.5-14.9,41.3-9.9,9.8-23.4,14.8-40.1,14.8ZM547.3,77.8c-9.5,0-17.1,2.7-22.6,8.1-5.5,5.4-8.3,13.4-8.3,23.8v36.7c0,10.5,2.8,18.5,8.3,23.8,5.5,5.4,13.1,8.1,22.6,8.1s17.3-2.7,22.7-8.1c5.4-5.4,8.2-13.4,8.2-23.9v-36.7c0-10.5-2.8-18.5-8.2-23.9-5.4-5.4-13.1-8.1-22.7-8.1Z"/>
<path class="cls-1" d="M460.6,197.8c-18,0-32.6-14.6-32.6-32.6V12.5h24.1v152.6c0,4.7,3.8,8.5,8.5,8.5h16.8v24.1h-16.8Z"/>
<path class="cls-1" d="M722.8,198.1V45.2c0-4.7-3.8-8.5-8.5-8.5h-16.8V12.5h16.8c18,0,32.6,14.6,32.6,32.6v152.9h-24.1Z"/>
<path class="cls-1" d="M665.2,198.1c-18,0-32.6-14.6-32.6-32.6v-85.1h-20.3v-22.1h20.3V12.9h24.1v45.3h30.2v22.1h-30.2v85.1c0,4.7,3.8,8.5,8.5,8.5h21.7v24.1h-21.7Z"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.3 KiB

View File

@@ -1,12 +1,12 @@
--extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
packaging==23.2 packaging==23.2
peft==0.13.2 peft==0.13.2
transformers==4.46.1 transformers==4.46.3
tokenizers>=0.20.1 tokenizers>=0.20.1
bitsandbytes==0.44.1 bitsandbytes==0.44.1
accelerate==1.1.0 accelerate==1.1.0
datasets==3.0.1 datasets==3.1.0
deepspeed==0.15.3 deepspeed==0.15.4
pydantic==2.6.3 pydantic==2.6.3
addict addict
fire fire
@@ -31,7 +31,7 @@ art
gradio==3.50.2 gradio==3.50.2
tensorboard tensorboard
python-dotenv==1.0.1 python-dotenv==1.0.1
autoawq>=0.2.5 autoawq==0.2.7.post2
triton>=2.3.0 triton>=2.3.0
liger-kernel==0.4.1 liger-kernel==0.4.1
@@ -53,3 +53,4 @@ immutabledict==4.2.0
antlr4-python3-runtime==4.13.2 antlr4-python3-runtime==4.13.2
torchao==0.5.0 torchao==0.5.0
schedulefree==1.3.0

View File

@@ -2,7 +2,7 @@
# Export specific ENV variables to /etc/rp_environment # Export specific ENV variables to /etc/rp_environment
echo "Exporting environment variables..." echo "Exporting environment variables..."
printenv | grep -E '^RUNPOD_|^PATH=|^_=' | sed 's/^\(.*\)=\(.*\)$/export \1="\2"/' >> /etc/rp_environment printenv | grep -E '^HF_|^BNB_|^CUDA_|^NCCL_|^NV|^RUNPOD_|^PATH=|^_=' | sed 's/^\([^=]*\)=\(.*\)$/export \1="\2"/' | grep -v 'printenv' >> /etc/rp_environment
echo 'source /etc/rp_environment' >> ~/.bashrc echo 'source /etc/rp_environment' >> ~/.bashrc
add_keys_to_authorized() { add_keys_to_authorized() {

View File

@@ -96,11 +96,11 @@ install_requires, dependency_links = parse_requirements()
setup( setup(
name="axolotl", name="axolotl",
version="0.5.0", version="0.5.1",
description="LLM Trainer", description="LLM Trainer",
long_description="Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.", long_description="Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.",
package_dir={"": "src"}, package_dir={"": "src"},
packages=find_packages(), packages=find_packages("src"),
install_requires=install_requires, install_requires=install_requires,
dependency_links=dependency_links, dependency_links=dependency_links,
extras_require={ extras_require={
@@ -108,7 +108,7 @@ setup(
"flash-attn==2.7.0.post2", "flash-attn==2.7.0.post2",
], ],
"deepspeed": [ "deepspeed": [
"deepspeed==0.14.4", "deepspeed==0.15.4",
"deepspeed-kernels", "deepspeed-kernels",
], ],
"mamba-ssm": [ "mamba-ssm": [

View File

@@ -1038,24 +1038,37 @@ class AxolotlDPOTrainer(SchedulerMixin, DPOTrainer):
return super().push_to_hub(*args, **kwargs) return super().push_to_hub(*args, **kwargs)
@staticmethod
def tokenize_row( def tokenize_row(
self,
features, features,
processing_class, processing_class,
max_prompt_length, max_prompt_length,
max_completion_length, max_completion_length,
add_special_tokens, add_special_tokens,
) -> Dict: ) -> Dict:
res = super().tokenize_row( res = DPOTrainer.tokenize_row(
features, features,
processing_class, processing_class,
max_prompt_length, max_prompt_length,
max_completion_length, max_completion_length,
add_special_tokens, add_special_tokens,
) )
if processing_class.bos_token_id is None and res["prompt_input_ids"][0] is None: # fix when the tokenizer doesn't have a bos_token_id, e.g. Qwen
if processing_class.bos_token is None and res["prompt_input_ids"][0] is None:
for key in res.keys(): for key in res.keys():
res[key] = res[key][1:] res[key] = res[key][1:]
if processing_class.bos_token and processing_class.bos_token_id is not None:
# dpo trainer may incorrectly prepend the bos_token_id to the dpo outputs
if res["chosen_input_ids"][0] == processing_class.bos_token_id:
res["chosen_input_ids"] = res["chosen_input_ids"][1:]
res["chosen_labels"] = res["chosen_labels"][1:]
res["chosen_attention_mask"] = res["chosen_attention_mask"][1:]
if res["rejected_input_ids"][0] == processing_class.bos_token_id:
res["rejected_input_ids"] = res["rejected_input_ids"][1:]
res["rejected_labels"] = res["rejected_labels"][1:]
res["rejected_attention_mask"] = res["rejected_attention_mask"][1:]
return res return res
def training_step( def training_step(
@@ -1416,17 +1429,15 @@ class HFCausalTrainerBuilder(TrainerBuilderBase):
if not self.cfg.test_datasets and self.cfg.val_set_size == 0: if not self.cfg.test_datasets and self.cfg.val_set_size == 0:
# no eval set, so don't eval # no eval set, so don't eval
training_arguments_kwargs["evaluation_strategy"] = "no" training_arguments_kwargs["eval_strategy"] = "no"
elif self.cfg.eval_steps: elif self.cfg.eval_steps:
training_arguments_kwargs["evaluation_strategy"] = "steps" training_arguments_kwargs["eval_strategy"] = "steps"
training_arguments_kwargs["eval_steps"] = self.cfg.eval_steps training_arguments_kwargs["eval_steps"] = self.cfg.eval_steps
elif self.cfg.evaluation_strategy: elif self.cfg.eval_strategy:
training_arguments_kwargs[ training_arguments_kwargs["eval_strategy"] = self.cfg.eval_strategy
"evaluation_strategy"
] = self.cfg.evaluation_strategy
else: else:
# we have an eval set, but no steps defined, default to use epoch # we have an eval set, but no steps defined, default to use epoch
training_arguments_kwargs["evaluation_strategy"] = "epoch" training_arguments_kwargs["eval_strategy"] = "epoch"
if self.cfg.save_steps: if self.cfg.save_steps:
training_arguments_kwargs["save_strategy"] = "steps" training_arguments_kwargs["save_strategy"] = "steps"
@@ -1860,10 +1871,10 @@ class HFRLTrainerBuilder(TrainerBuilderBase):
training_args_kwargs["save_safetensors"] = self.cfg.save_safetensors training_args_kwargs["save_safetensors"] = self.cfg.save_safetensors
if self.eval_dataset: if self.eval_dataset:
training_args_kwargs["evaluation_strategy"] = "steps" training_args_kwargs["eval_strategy"] = "steps"
training_args_kwargs["eval_steps"] = self.cfg.eval_steps training_args_kwargs["eval_steps"] = self.cfg.eval_steps
else: else:
training_args_kwargs["evaluation_strategy"] = "no" training_args_kwargs["eval_strategy"] = "no"
if self.cfg.bf16 or self.cfg.bfloat16: if self.cfg.bf16 or self.cfg.bfloat16:
training_args_kwargs["bf16"] = True training_args_kwargs["bf16"] = True

View File

View File

@@ -1,4 +1,5 @@
"""multipack patching for v2 of sample packing""" """multipack patching for v2 of sample packing"""
import importlib import importlib
import transformers import transformers
@@ -27,71 +28,28 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [
] ]
def patch_for_multipack(model_type, model_name=None, is_remote_code=False): def patch_for_multipack(model_type, model_name=None, has_remote_code=False):
if model_type == "gemmoe": if has_remote_code:
patch_remote(model_name, ".configuration_gemmoe", ".modeling_gemmoe") patch_remote(model_name)
elif model_type == "deepseek_v2": elif hasattr(transformers, "modeling_flash_attention_utils"):
patch_remote(model_name, ".configuration_deepseek", ".modeling_deepseek")
elif hasattr(transformers, "modeling_flash_attention_utils") and not is_remote_code:
transformers.modeling_flash_attention_utils._get_unpad_data = ( # pylint: disable=protected-access transformers.modeling_flash_attention_utils._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data get_unpad_data
) )
if model_type == "mixtral" and is_deepspeed_zero3_enabled():
patch_mixtral_moe_forward_zero3()
return
# retain for legacy if model_type == "mixtral" and is_deepspeed_zero3_enabled():
if model_type == "mixtral": patch_mixtral_moe_forward_zero3()
transformers.models.mixtral.modeling_mixtral._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
if is_deepspeed_zero3_enabled():
patch_mixtral_moe_forward_zero3()
elif model_type == "llama":
if hasattr(transformers.models.llama.modeling_llama, "_get_unpad_data"):
transformers.models.llama.modeling_llama._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "mistral":
if hasattr(transformers.models.mistral.modeling_mistral, "_get_unpad_data"):
transformers.models.llama.modeling_llama._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "qwen2":
transformers.models.qwen2.modeling_qwen2._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "qwen2_moe":
transformers.models.qwen2_moe.modeling_qwen2_moe._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "falcon":
transformers.models.falcon.modeling_falcon._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "phi":
transformers.models.phi.modeling_phi._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "gemma":
transformers.models.gemma.modeling_gemma._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "gemma2":
transformers.models.gemma2.modeling_gemma2._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
elif model_type == "starcoder2":
transformers.models.starcoder2.modeling_starcoder2._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)
def patch_remote(model_name, config_name, modeling_name): def patch_remote(model_name):
model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True) model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
# we need to load the model here in order for modeling_* to be available # we need to load the model here in order for modeling_* to be available
with init_empty_weights(): with init_empty_weights():
AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True) AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
module_name = model_config.__class__.__module__.replace(config_name, modeling_name) parts = model_config.__class__.__module__.split(".")
parts[-1] = parts[-1].replace("configuration_", "modeling_", 1)
module_name = ".".join(parts)
modeling_arch = importlib.import_module(module_name) modeling_arch = importlib.import_module(module_name)
modeling_arch._get_unpad_data = get_unpad_data # pylint: disable=protected-access if hasattr(modeling_arch, "_get_unpad_data"):
modeling_arch._get_unpad_data = ( # pylint: disable=protected-access
get_unpad_data
)

View File

@@ -64,10 +64,7 @@ class EvalFirstStepCallback(
control: TrainerControl, control: TrainerControl,
**kwargs, **kwargs,
): ):
if ( if args.eval_strategy == IntervalStrategy.STEPS and state.global_step == 1:
args.evaluation_strategy == IntervalStrategy.STEPS
and state.global_step == 1
):
control.should_evaluate = True control.should_evaluate = True
return control return control

View File

@@ -1,8 +1,6 @@
"""Module for working with config dicts""" """Module for working with config dicts"""
import json
import logging import logging
import os import os
from pathlib import Path
from typing import Optional from typing import Optional
import torch import torch
@@ -10,7 +8,6 @@ from transformers.utils import is_torch_bf16_gpu_available
from axolotl.integrations.config import merge_input_args from axolotl.integrations.config import merge_input_args
from axolotl.utils.bench import log_gpu_memory_usage from axolotl.utils.bench import log_gpu_memory_usage
from axolotl.utils.config.models.input.v0_4_1 import SUPPORTED_METRICS
from axolotl.utils.config.models.input.v0_4_1 import ( from axolotl.utils.config.models.input.v0_4_1 import (
AxolotlConfigWCapabilities as AxolotlConfigWCapabilitiesBase, AxolotlConfigWCapabilities as AxolotlConfigWCapabilitiesBase,
) )
@@ -247,370 +244,3 @@ def validate_config(cfg: DictDefault, capabilities: Optional[dict] = None):
return DictDefault( return DictDefault(
dict(AxolotlInputConfig(**cfg.to_dict()).model_dump(exclude_none=True)) dict(AxolotlInputConfig(**cfg.to_dict()).model_dump(exclude_none=True))
) )
def legacy_validate_config(cfg):
"""
This is a "pre-validation" step that handles the yaml configuration before we have any
information about the model architecture
"""
if is_torch_bf16_gpu_available():
if not cfg.bf16 and not cfg.bfloat16:
LOG.info("bf16 support detected, but not enabled for this configuration.")
else:
if (
not cfg.merge_lora
and not cfg.is_preprocess
and (cfg.bf16 is True or cfg.bfloat16 is True)
):
raise ValueError(
"bf16 requested, but AMP is not supported on this GPU. Requires Ampere series or above."
)
if (
# pylint: disable=too-many-boolean-expressions
not (cfg.bf16 or cfg.bfloat16)
and (cfg.fp16 or cfg.float16)
and not cfg.adapter
and not cfg.flash_attention
and cfg.sample_packing
):
LOG.warning(
"Full fine tune w/o FA2 w/ sample packing and fp16/float16 is likely to raise errors. Try LoRA."
)
# ValueError: Attempting to unscale FP16 gradients.
# OR
# RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
if cfg.max_packed_sequence_len:
raise DeprecationWarning("`max_packed_sequence_len` is no longer supported")
if cfg.sample_packing and cfg.rl:
raise ValueError("`sample_packing: true` does not work with RLHF training")
if cfg.sample_packing and not cfg.pad_to_sequence_len:
LOG.warning(
"`pad_to_sequence_len: true` is recommended when using sample_packing"
)
if cfg.gradient_accumulation_steps and cfg.batch_size:
raise ValueError(
"please set only one of gradient_accumulation_steps or batch_size"
)
if cfg.batch_size:
LOG.warning(
"%s\n%s",
"batch_size is not recommended. Please use gradient_accumulation_steps instead.",
"To calculate the equivalent gradient_accumulation_steps, divide batch_size / micro_batch_size / number of gpus.",
)
if (
cfg.eval_batch_size
and cfg.micro_batch_size
and cfg.eval_batch_size != cfg.micro_batch_size
):
LOG.warning(
"eval_batch_size != micro_batch_size. This can lead to VRAM instability."
)
if cfg.adapter == "qlora":
if cfg.merge_lora:
# can't merge qlora if loaded in 8bit or 4bit
if cfg.load_in_8bit:
raise ValueError("Can't merge qlora if loaded in 8bit")
if cfg.gptq:
raise ValueError("Can't merge qlora if gptq")
if cfg.load_in_4bit:
raise ValueError("Can't merge qlora if loaded in 4bit")
else:
if cfg.load_in_8bit:
raise ValueError("Can't load qlora in 8bit")
if cfg.gptq:
raise ValueError("Can't load qlora if gptq")
if not cfg.load_in_4bit:
raise ValueError("Require cfg.load_in_4bit to be True for qlora")
if cfg.flash_attn_fuse_qkv or cfg.flash_attn_fuse_mlp:
raise ValueError("Fused modules are not supported with QLoRA")
loftq = cfg.peft and cfg.peft.loftq_config and cfg.peft.loftq_config.loftq_bits
if not cfg.load_in_8bit and cfg.adapter == "lora" and not loftq:
LOG.warning("We recommend setting `load_in_8bit: true` for LORA finetuning")
if cfg.adapter == "lora" and (cfg.flash_attn_fuse_qkv or cfg.flash_attn_fuse_mlp):
raise ValueError("Fused modules are not supported with LoRA")
if cfg.adapter and cfg.peft_layers_to_transform and cfg.unfrozen_parameters:
raise ValueError(
"`unfrozen_parameters` used with `peft_layers_to_transform` can have unexpected behavior."
)
if cfg.relora_steps:
if cfg.adapter not in ("lora", "qlora"):
raise ValueError("cfg.adapter must be lora or qlora to use ReLoRA")
if cfg.fsdp:
raise ValueError("fsdp not supported with ReLoRA")
if cfg.deepspeed:
raise ValueError("deepspeed not supported with ReLoRA")
if cfg.lr_scheduler == "one_cycle":
raise ValueError("ReLoRA is not compatible with the one_cycle scheduler")
if cfg.flash_attn_fuse_qkv or cfg.flash_attn_fuse_mlp:
raise ValueError("Fused modules are not supported with ReLoRA")
if cfg.trust_remote_code:
LOG.warning(
"`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model."
)
if cfg.push_dataset_to_hub and cfg.hf_use_auth_token is not True:
raise ValueError(
"Require cfg.hf_use_auth_token to be True for push_dataset_to_hub"
)
if (cfg.base_model and "falcon" in cfg.base_model.lower()) and cfg.fsdp:
raise ValueError("FSDP is not supported for falcon models")
if (
cfg.base_model and "mpt" in cfg.base_model.lower()
) and cfg.gradient_checkpointing:
raise ValueError("gradient_checkpointing is not supported for MPT models")
if cfg.flash_optimum is True:
if cfg.adapter:
LOG.warning("BetterTransformers probably doesn't work with PEFT adapters")
if cfg.fp16 or cfg.bf16:
raise ValueError("AMP is not supported with BetterTransformer")
if cfg.float16 is not True and cfg.bfloat16 is not True:
LOG.warning(
"You should probably set bfloat16 or float16 to true to "
"load the model in float16 for BetterTransformers"
)
if int(torch.__version__.split(".", maxsplit=1)[0]) < 2:
LOG.warning("torch>=2.0.0 required")
raise ValueError(
f"flash_optimum for BetterTransformers may not be used with {torch.__version__}"
)
if cfg.pretraining_dataset and cfg.group_by_length:
LOG.warning(
"You probably want to disable group_by_length as it will force a streamed dataset to download completely."
)
if cfg.pretraining_dataset and not cfg.max_steps:
raise ValueError(
"max_steps must be set when using iterable pretraining_dataset, Trainer can't infer length and schedule optimizer/learning rate without it!"
)
if any([cfg.adam_beta1, cfg.adam_beta2, cfg.adam_epsilon]) and (
not cfg.optimizer or "adamw" not in cfg.optimizer
):
LOG.warning("adamw hyperparameters found, but no adamw optimizer set")
if cfg.push_to_hub_model_id:
raise ValueError(
"push_to_hub_model_id is deprecated. Please use hub_model_id instead."
)
if cfg.hub_model_id and cfg.save_strategy not in ["steps", "epoch", None]:
LOG.warning(
"hub_model_id is set without any models being saved. To save a model, set save_strategy to steps, epochs or leave empty."
)
if cfg.gptq and cfg.revision_of_model:
raise ValueError(
"revision_of_model is not supported for GPTQ models. "
+ "Please download the model from HuggingFace Hub manually for correct branch, "
+ "point to its path, and remove revision_of_model from the config."
)
# if cfg.sample_packing and cfg.sdp_attention:
# # incompatible due to bug w/ accelerate causing 0.0 loss when using llama2
# raise ValueError(
# "sample_packing not compatible with sdp_attention. Use flash_attention"
# )
if cfg.sample_packing and cfg.xformers_attention:
raise ValueError(
"sample_packing not compatible with xformers_attention. Use flash_attention"
)
if cfg.sample_packing and cfg.sdp_attention and (cfg.bfloat16 or cfg.bf16):
# https://github.com/pytorch/pytorch/blob/1b03423526536b5f3d35bdfa95ccc6197556cf9b/test/test_transformers.py#L2440-L2450
LOG.warning(
"sample_packing & torch sdpa with bf16 is unsupported may results in 0.0 loss. "
"This may work on H100s."
)
if cfg.early_stopping_patience:
if not cfg.save_steps or not cfg.eval_steps:
raise ValueError(
"`early_stopping_patience` requires save_steps and eval_steps to be set. eval_steps should evenly divide save_steps."
)
if cfg.save_steps % cfg.eval_steps != 0:
raise ValueError(
"`early_stopping_patience` requires that eval_steps should evenly divide save_steps."
)
if cfg.saves_per_epoch and cfg.save_steps:
raise ValueError(
"save_steps and saves_per_epoch are mutually exclusive and cannot be used together."
)
if cfg.save_strategy and cfg.saves_per_epoch and cfg.save_strategy != "steps":
raise ValueError(
"save_strategy must be empty or set to `steps` when used with saves_per_epoch."
)
if cfg.save_strategy and cfg.save_steps and cfg.save_strategy != "steps":
raise ValueError(
"save_strategy and save_steps mismatch. Please set save_strategy to 'steps' or remove save_steps."
)
if cfg.evals_per_epoch and cfg.eval_steps:
raise ValueError(
"eval_steps and evals_per_epoch are mutually exclusive and cannot be used together."
)
if (
cfg.evals_per_epoch
and cfg.evaluation_strategy
and cfg.evaluation_strategy != "steps"
):
raise ValueError(
"evaluation_strategy must be empty or set to `steps` when used with evals_per_epoch."
)
if (
cfg.evaluation_strategy
and cfg.eval_steps
and cfg.evaluation_strategy != "steps"
):
raise ValueError(
"evaluation_strategy and eval_steps mismatch. Please set evaluation_strategy to 'steps' or remove eval_steps."
)
if (
cfg.val_set_size == 0
and (cfg.eval_steps or cfg.evaluation_strategy)
and not cfg.test_datasets
):
raise ValueError(
"eval_steps and evaluation_strategy are not supported with val_set_size == 0"
)
if (
cfg.sample_packing
and cfg.eval_table_size
and cfg.eval_sample_packing is not False
):
raise ValueError(
"eval_table_size and eval_sample_packing are not supported together with sample_packing. Please set 'eval_sample_packing' to false."
)
if not cfg.adapter and (cfg.load_in_8bit or cfg.load_in_4bit):
raise ValueError(
"load_in_8bit and load_in_4bit are not supported without setting an adapter."
"If you want to full finetune, please turn off load_in_8bit and load_in_4bit."
)
if cfg.rope_scaling:
LOG.warning("`rope_scaling` should now be be a key under `model_config`")
if cfg.wandb_run_id and not cfg.wandb_name:
cfg.wandb_name = cfg.wandb_run_id
LOG.warning(
"wandb_run_id sets the ID of the run. If you would like to set the name, please use wandb_name instead."
)
if cfg.noisy_embedding_alpha is not None:
# Deprecated, use neftune_noise_alpha
LOG.warning("noisy_embedding_alpha is deprecated, use neftune_noise_alpha")
if cfg.neftune_noise_alpha is None:
cfg.neftune_noise_alpha = cfg.noisy_embedding_alpha
else:
# User is providing both; bail and have them sort out their settings
raise ValueError(
"noisy_embedding_alpha is deprecated, use neftune_noise_alpha; both are set, please remove the deprecated noisy_embedding_alpha setting"
)
if cfg.neftune_noise_alpha is not None and cfg.neftune_noise_alpha <= 0.0:
raise ValueError("neftune_noise_alpha must be > 0.0")
if cfg.max_memory is not None and cfg.gpu_memory_limit is not None:
raise ValueError(
"max_memory and gpu_memory_limit are mutually exclusive and cannot be used together."
)
if (
cfg.unfrozen_parameters
and cfg.gradient_checkpointing_kwargs
and cfg.gradient_checkpointing_kwargs.use_reentrant is True
):
# https://github.com/huggingface/transformers/issues/21381
raise ValueError(
"`use_reentrant` must be false when used with partially frozen model."
)
if cfg.deepspeed and Path(cfg.deepspeed).is_file():
with open(cfg.deepspeed, encoding="utf-8") as file:
contents = file.read()
deepspeed_cfg: DictDefault = DictDefault(json.loads(contents))
if cfg.flash_attention:
if (
deepspeed_cfg.zero_optimization
and deepspeed_cfg.zero_optimization.stage == 3
):
if not (
(
deepspeed_cfg.bf16
and deepspeed_cfg.bf16.enabled # pylint: disable=no-member
is True
)
or (
deepspeed_cfg.fp16
and deepspeed_cfg.fp16.enabled # pylint: disable=no-member
is True
)
):
raise ValueError(
"bf16.enabled or fp16.enabled must be set to true when using ZeRO-3 with flash-attention"
)
if "8bit" in cfg.optimizer and deepspeed_cfg.optimizer:
LOG.warning(
f"conflicting optimizer: {cfg.optimizer} used alongside deepspeed optimizer."
)
if cfg.test_datasets and cfg.val_set_size:
raise ValueError(
"non-zero val_set_size should not be used with test_datasets configuration"
)
if cfg.fsdp and "bnb" in cfg.optimizer:
raise ValueError(f"FSDP not compatible with {cfg.optimizer}")
if cfg.do_causal_lm_eval and cfg.eval_sample_packing:
raise ValueError(
"do_causal_lm_eval is enabled, eval_sample_packing must be set to False"
)
if cfg.eval_causal_lm_metrics:
if not isinstance(cfg.eval_causal_lm_metrics, list):
raise ValueError("eval_causal_lm_metrics must be a list")
# only ["sacrebleu", "comet", "ter", "chrf"] supported
if set(cfg.eval_causal_lm_metrics) - SUPPORTED_METRICS:
raise ValueError(
f"eval_causal_lm_metrics must be one of {SUPPORTED_METRICS}"
)
# TODO
# MPT 7b
# https://github.com/facebookresearch/bitsandbytes/issues/25
# no 8bit adaAmw w bf16
# GPT-NeoX
# evals broken when extending context len
# File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 162, in forward attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
# File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/optimum/bettertransformer/models/attention.py", line 74, in gpt2_wrapped_scaled_dot_product
# attention_mask = causal_mask + attention_mask
# RuntimeError: The size of tensor a (2048) must match the size of tensor b (8132) at non-singleton dimension 3

View File

@@ -68,6 +68,7 @@ class DeprecatedParameters(BaseModel):
rope_scaling: Optional[Any] = None rope_scaling: Optional[Any] = None
noisy_embedding_alpha: Optional[float] = None noisy_embedding_alpha: Optional[float] = None
dpo_beta: Optional[float] = None dpo_beta: Optional[float] = None
evaluation_strategy: Optional[str] = None
@field_validator("max_packed_sequence_len") @field_validator("max_packed_sequence_len")
@classmethod @classmethod
@@ -99,6 +100,13 @@ class DeprecatedParameters(BaseModel):
LOG.warning("dpo_beta is deprecated, use rl_beta instead") LOG.warning("dpo_beta is deprecated, use rl_beta instead")
return dpo_beta return dpo_beta
@field_validator("evaluation_strategy")
@classmethod
def validate_evaluation_strategy(cls, evaluation_strategy):
if evaluation_strategy is not None:
LOG.warning("evaluation_strategy is deprecated, use eval_strategy instead")
return evaluation_strategy
class RemappedParameters(BaseModel): class RemappedParameters(BaseModel):
"""parameters that have been remapped to other names""" """parameters that have been remapped to other names"""
@@ -242,8 +250,10 @@ class KTODataset(BaseModel):
class LoftQConfig(BaseModel): class LoftQConfig(BaseModel):
"""LoftQ configuration subset""" """LoftQ configuration subset"""
loftq_bits: int = Field(default=4, metadata={"help": "Quantization bits for LoftQ"}) loftq_bits: int = Field(
# loftq_iter: int = Field(default=1, metadata={"help": "Alternating iterations for LoftQ"}) default=4, json_schema_extra={"description": "Quantization bits for LoftQ"}
)
# loftq_iter: int = Field(default=1, json_schema_extra={"description": "Alternating iterations for LoftQ"})
class PeftConfig(BaseModel): class PeftConfig(BaseModel):
@@ -286,8 +296,8 @@ class LoraConfig(BaseModel):
qlora_sharded_model_loading: Optional[bool] = Field( qlora_sharded_model_loading: Optional[bool] = Field(
default=False, default=False,
metadata={ json_schema_extra={
"help": "load qlora model in sharded format for FSDP using answer.ai technique." "description": "load qlora model in sharded format for FSDP using answer.ai technique."
}, },
) )
lora_on_cpu: Optional[bool] = None lora_on_cpu: Optional[bool] = None
@@ -296,13 +306,15 @@ class LoraConfig(BaseModel):
loraplus_lr_ratio: Optional[float] = Field( loraplus_lr_ratio: Optional[float] = Field(
default=None, default=None,
metadata={ json_schema_extra={
"help": "loraplus learning rate ratio lr_B / lr_A. Recommended value is 2^4." "description": "loraplus learning rate ratio lr_B / lr_A. Recommended value is 2^4."
}, },
) )
loraplus_lr_embedding: Optional[float] = Field( loraplus_lr_embedding: Optional[float] = Field(
default=1e-6, default=1e-6,
metadata={"help": "loraplus learning rate for lora embedding layers."}, json_schema_extra={
"description": "loraplus learning rate for lora embedding layers."
},
) )
merge_lora: Optional[bool] = None merge_lora: Optional[bool] = None
@@ -372,10 +384,10 @@ class ModelInputConfig(BaseModel):
tokenizer_use_fast: Optional[bool] = None tokenizer_use_fast: Optional[bool] = None
tokenizer_legacy: Optional[bool] = None tokenizer_legacy: Optional[bool] = None
tokenizer_type: Optional[str] = Field( tokenizer_type: Optional[str] = Field(
default=None, metadata={"help": "transformers tokenizer class"} default=None, json_schema_extra={"description": "transformers tokenizer class"}
) )
processor_type: Optional[str] = Field( processor_type: Optional[str] = Field(
default=None, metadata={"help": "transformers processor class"} default=None, json_schema_extra={"description": "transformers processor class"}
) )
trust_remote_code: Optional[bool] = None trust_remote_code: Optional[bool] = None
@@ -397,18 +409,18 @@ class HyperparametersConfig(BaseModel):
gradient_accumulation_steps: Optional[int] = Field(default=1) gradient_accumulation_steps: Optional[int] = Field(default=1)
micro_batch_size: Optional[int] = Field( micro_batch_size: Optional[int] = Field(
default=1, default=1,
metadata={"help": "per gpu micro batch size for training"}, json_schema_extra={"description": "per gpu micro batch size for training"},
) )
batch_size: Optional[int] = Field( batch_size: Optional[int] = Field(
default=None, default=None,
metadata={ json_schema_extra={
"help": "Total batch size, we do not recommended setting this manually" "description": "Total batch size, we do not recommended setting this manually"
}, },
) )
eval_batch_size: Optional[int] = Field( eval_batch_size: Optional[int] = Field(
default=None, default=None,
metadata={ json_schema_extra={
"help": "per gpu micro batch size for evals, defaults to value of micro_batch_size" "description": "per gpu micro batch size for evals, defaults to value of micro_batch_size"
}, },
) )
@@ -433,12 +445,13 @@ class HyperparametersConfig(BaseModel):
] ]
] = OptimizerNames.ADAMW_HF.value ] = OptimizerNames.ADAMW_HF.value
optim_args: Optional[Union[str, Dict[str, Any]]] = Field( optim_args: Optional[Union[str, Dict[str, Any]]] = Field(
default=None, metadata={"help": "Optional arguments to supply to optimizer."} default=None,
json_schema_extra={"description": "Optional arguments to supply to optimizer."},
) )
optim_target_modules: Optional[Union[List[str], Literal["all_linear"]]] = Field( optim_target_modules: Optional[Union[List[str], Literal["all_linear"]]] = Field(
default=None, default=None,
metadata={ json_schema_extra={
"help": "The target modules to optimize, i.e. the module names that you would like to train." "description": "The target modules to optimize, i.e. the module names that you would like to train."
}, },
) )
torchdistx_path: Optional[str] = None torchdistx_path: Optional[str] = None
@@ -498,15 +511,15 @@ class LISAConfig(BaseModel):
lisa_n_layers: Optional[int] = Field( lisa_n_layers: Optional[int] = Field(
default=None, default=None,
metadata={"help": "the number of activate layers in LISA"}, json_schema_extra={"description": "the number of activate layers in LISA"},
) )
lisa_step_interval: Optional[int] = Field( lisa_step_interval: Optional[int] = Field(
default=None, default=None,
metadata={"help": "how often to switch layers in LISA"}, json_schema_extra={"description": "how often to switch layers in LISA"},
) )
lisa_layers_attribute: Optional[str] = Field( lisa_layers_attribute: Optional[str] = Field(
default="model.layers", default="model.layers",
metadata={"help": "path under the model to access the layers"}, json_schema_extra={"description": "path under the model to access the layers"},
) )
@@ -605,7 +618,8 @@ class AxolotlInputConfig(
pretraining_dataset: Optional[ # type: ignore pretraining_dataset: Optional[ # type: ignore
conlist(Union[PretrainingDataset, SFTDataset], min_length=1) conlist(Union[PretrainingDataset, SFTDataset], min_length=1)
] = Field( ] = Field(
default=None, metadata={"help": {"streaming dataset to use for pretraining"}} default=None,
json_schema_extra={"description": "streaming dataset to use for pretraining"},
) )
dataset_processes: Optional[int] = Field(default=os.cpu_count()) dataset_processes: Optional[int] = Field(default=os.cpu_count())
dataset_keep_in_memory: Optional[bool] = None dataset_keep_in_memory: Optional[bool] = None
@@ -665,7 +679,8 @@ class AxolotlInputConfig(
sequence_len: int = Field(default=512) sequence_len: int = Field(default=512)
min_sample_len: Optional[int] = None min_sample_len: Optional[int] = None
max_prompt_len: int = Field( max_prompt_len: int = Field(
default=512, metadata={"help": "maximum prompt length for RL training"} default=512,
json_schema_extra={"description": "maximum prompt length for RL training"},
) )
sample_packing: Optional[bool] = None sample_packing: Optional[bool] = None
sample_packing_group_size: Optional[int] = 100_000 sample_packing_group_size: Optional[int] = 100_000
@@ -684,8 +699,8 @@ class AxolotlInputConfig(
pretrain_multipack_buffer_size: Optional[int] = 10_000 pretrain_multipack_buffer_size: Optional[int] = 10_000
pretrain_multipack_attn: Optional[bool] = Field( pretrain_multipack_attn: Optional[bool] = Field(
default=True, default=True,
metadata={ json_schema_extra={
"help": "whether to prevent cross attention for packed sequences during pretraining", "description": "whether to prevent cross attention for packed sequences during pretraining",
}, },
) )
@@ -731,7 +746,7 @@ class AxolotlInputConfig(
warmup_ratio: Optional[float] = None warmup_ratio: Optional[float] = None
eval_steps: Optional[Union[int, float]] = None eval_steps: Optional[Union[int, float]] = None
evals_per_epoch: Optional[Union[int]] = None evals_per_epoch: Optional[Union[int]] = None
evaluation_strategy: Optional[str] = None eval_strategy: Optional[str] = None
save_steps: Optional[Union[int, float]] = None save_steps: Optional[Union[int, float]] = None
saves_per_epoch: Optional[int] = None saves_per_epoch: Optional[int] = None
save_strategy: Optional[str] = None save_strategy: Optional[str] = None
@@ -1033,21 +1048,21 @@ class AxolotlInputConfig(
@classmethod @classmethod
def check_evals(cls, data): def check_evals(cls, data):
if ( if (
data.get("evaluation_strategy") data.get("eval_strategy")
and data.get("eval_steps") and data.get("eval_steps")
and data.get("evaluation_strategy") != "steps" and data.get("eval_strategy") != "steps"
): ):
raise ValueError( raise ValueError(
"evaluation_strategy and eval_steps mismatch. Please set evaluation_strategy to 'steps' or remove eval_steps." "eval_strategy and eval_steps mismatch. Please set eval_strategy to 'steps' or remove eval_steps."
) )
if ( if (
data.get("val_set_size") == 0 data.get("val_set_size") == 0
and (data.get("eval_steps") or data.get("evaluation_strategy")) and (data.get("eval_steps") or data.get("eval_strategy"))
and not data.get("test_datasets") and not data.get("test_datasets")
): ):
raise ValueError( raise ValueError(
"eval_steps and evaluation_strategy are not supported with val_set_size == 0" "eval_steps and eval_strategy are not supported with val_set_size == 0"
) )
if data.get("evals_per_epoch") and data.get("eval_steps"): if data.get("evals_per_epoch") and data.get("eval_steps"):
raise ValueError( raise ValueError(
@@ -1055,11 +1070,11 @@ class AxolotlInputConfig(
) )
if ( if (
data.get("evals_per_epoch") data.get("evals_per_epoch")
and data.get("evaluation_strategy") and data.get("eval_strategy")
and data.get("evaluation_strategy") != "steps" and data.get("eval_strategy") != "steps"
): ):
raise ValueError( raise ValueError(
"evaluation_strategy must be empty or set to `steps` when used with evals_per_epoch." "eval_strategy must be empty or set to `steps` when used with evals_per_epoch."
) )
if data.get("do_bench_eval") and not ( if data.get("do_bench_eval") and not (
@@ -1291,6 +1306,25 @@ class AxolotlInputConfig(
) )
return data return data
@model_validator(mode="before")
@classmethod
def warn_qlora_zero3_w_use_reentrant(cls, data):
if (
data.get("adapter") == "qlora"
and data.get("gradient_checkpointing_kwargs", {})
and data.get("gradient_checkpointing_kwargs", {}).get("use_reentrant")
is False
and "zero3" in data.get("deepspeed", "")
):
# may result in:
# torch.utils.checkpoint.CheckpointError: torch.utils.checkpoint:
# Recomputed values for the following tensors have different metadata
# than during the forward pass.
LOG.warning(
"qlora + zero3 with use_reentrant: false may result in a CheckpointError about recomputed values"
)
return data
@model_validator(mode="before") @model_validator(mode="before")
@classmethod @classmethod
def check_val_w_test_datasets(cls, data): def check_val_w_test_datasets(cls, data):
@@ -1300,6 +1334,19 @@ class AxolotlInputConfig(
) )
return data return data
@model_validator(mode="before")
@classmethod
def check_eval_strategy(cls, data):
if (
data.get("evaluation_strategy") is not None
and data.get("eval_strategy") is None
):
LOG.info(
"explicitly setting `eval_strategy` from the `evaluation_strategy`"
)
data["eval_strategy"] = data.get("evaluation_strategy")
return data
@model_validator(mode="before") @model_validator(mode="before")
@classmethod @classmethod
def check_fsdp_offload_w_8bit_optimizer(cls, data): def check_fsdp_offload_w_8bit_optimizer(cls, data):
@@ -1402,17 +1449,6 @@ class AxolotlInputConfig(
) )
return data return data
@model_validator(mode="before")
@classmethod
def check_fsdp_grad_accum_4_46_2(cls, data):
if data.get("fsdp") and data.get("gradient_accumulation_steps") > 1:
if version("transformers") == "4.46.2":
raise ValueError(
"FSDP w/ gradient_accumulation_steps > 1 is broken with transformers==4.46.2. "
"Please use a lower value or switch to an older version of transformers."
)
return data
class AxolotlConfigWCapabilities(AxolotlInputConfig): class AxolotlConfigWCapabilities(AxolotlInputConfig):
"""wrapper to valdiate gpu capabilities with the configured options""" """wrapper to valdiate gpu capabilities with the configured options"""

View File

@@ -64,15 +64,57 @@ def map_dataset(cfg, data_set, ds_transform_fn, tokenizer):
tokenizer = load_tokenizer(cfg) tokenizer = load_tokenizer(cfg)
ds_transform_fn = partial(ds_transform_fn, tokenizer=tokenizer) ds_transform_fn = partial(ds_transform_fn, tokenizer=tokenizer)
if isinstance(data_set, DatasetDict):
data_set = data_set["train"]
data_set = data_set.map( data_set = data_set.map(
ds_transform_fn, ds_transform_fn,
desc="Mapping RL Dataset", desc="Mapping RL Dataset",
) )
if isinstance(data_set, DatasetDict):
data_set = data_set["train"]
return data_set return data_set
def drop_long_rl_seq(
sample, rl, tokenizer, sequence_len # pylint: disable=invalid-name
):
if rl in ("dpo", "ipo", "orpo", "simpo"):
if not (
sample.get("prompt") and sample.get("chosen") and sample.get("rejected")
):
raise ValueError(
"Prompt, chosen and rejected keys are required for DPO/ORPO datasets"
)
prompt = sample["prompt"]
chosen = sample["chosen"]
rejected = sample["rejected"]
len_prompt = len(tokenizer(prompt, add_special_tokens=False)["input_ids"])
len_chosen = len(tokenizer(chosen, add_special_tokens=False)["input_ids"])
len_rejected = len(tokenizer(rejected, add_special_tokens=False)["input_ids"])
return (len_prompt + len_chosen) <= sequence_len and (
len_prompt + len_rejected
) <= sequence_len
if rl == "kto":
if not (sample.get("prompt") and sample.get("completion")):
raise ValueError("Prompt and completion keys are required for KTO datasets")
prompt = sample["prompt"]
completion = sample["completion"]
len_prompt = len(tokenizer(prompt, add_special_tokens=False)["input_ids"])
len_completion = len(
tokenizer(completion, add_special_tokens=False)["input_ids"]
)
return (len_prompt + len_completion) <= sequence_len
raise ValueError("Unknown RL type")
def load_prepare_dpo_datasets(cfg): def load_prepare_dpo_datasets(cfg):
def load_split(dataset_cfgs, _cfg): def load_split(dataset_cfgs, _cfg):
split_datasets: List[Any] = [] split_datasets: List[Any] = []
@@ -94,7 +136,7 @@ def load_prepare_dpo_datasets(cfg):
) )
split_datasets.insert(i, ds) split_datasets.insert(i, ds)
tokenizer = None tokenizer = load_tokenizer(cfg)
for i, data_set in enumerate(split_datasets): for i, data_set in enumerate(split_datasets):
_type = dataset_cfgs[i]["type"] _type = dataset_cfgs[i]["type"]
@@ -121,7 +163,28 @@ def load_prepare_dpo_datasets(cfg):
# "prompt", "chosen" and "rejected" already preprocessed # "prompt", "chosen" and "rejected" already preprocessed
split_datasets[i] = data_set split_datasets[i] = data_set
return concatenate_datasets(split_datasets) drop_long = partial(
drop_long_rl_seq,
rl=_cfg.rl,
tokenizer=tokenizer,
sequence_len=cfg.sequence_len,
)
prior_len = len(split_datasets[i])
split_datasets[i] = split_datasets[i].filter(
drop_long,
num_proc=cfg.dataset_processes,
load_from_cache_file=not cfg.is_preprocess,
desc="Dropping Long Sequences",
)
dropped = prior_len - len(split_datasets[i])
if dropped:
LOG.warning(f"Dropped {dropped} long samples from dataset index {i}")
combined_datasets = concatenate_datasets(split_datasets)
combined_datasets = combined_datasets.shuffle(seed=cfg.seed)
return combined_datasets
with zero_first(is_main_process()): with zero_first(is_main_process()):
train_is_preprocessed = False train_is_preprocessed = False

View File

@@ -260,6 +260,7 @@ def load_tokenized_prepared_datasets(
for config_dataset in for_d_in_datasets(cfg_datasets): for config_dataset in for_d_in_datasets(cfg_datasets):
ds: Optional[Union[Dataset, DatasetDict]] = None ds: Optional[Union[Dataset, DatasetDict]] = None
ds_from_hub = False ds_from_hub = False
ds_trust_remote_code = config_dataset.trust_remote_code
try: try:
# this is just a basic check to see if the path is a # this is just a basic check to see if the path is a
# valid HF dataset that's loadable # valid HF dataset that's loadable
@@ -269,6 +270,7 @@ def load_tokenized_prepared_datasets(
streaming=True, streaming=True,
token=use_auth_token, token=use_auth_token,
revision=config_dataset.revision, revision=config_dataset.revision,
trust_remote_code=ds_trust_remote_code,
) )
ds_from_hub = True ds_from_hub = True
except (FileNotFoundError, ConnectionError, HFValidationError, ValueError): except (FileNotFoundError, ConnectionError, HFValidationError, ValueError):
@@ -348,7 +350,15 @@ def load_tokenized_prepared_datasets(
split=None, split=None,
) )
else: else:
ds = load_from_disk(config_dataset.path) try:
ds = load_from_disk(config_dataset.path)
except FileNotFoundError:
ds = load_dataset(
config_dataset.path,
name=config_dataset.name,
streaming=False,
split=None,
)
elif local_path.is_file(): elif local_path.is_file():
ds_type = get_ds_type(config_dataset) ds_type = get_ds_type(config_dataset)
@@ -366,7 +376,7 @@ def load_tokenized_prepared_datasets(
elif ds_from_hub: elif ds_from_hub:
load_ds_kwargs = {} load_ds_kwargs = {}
if config_dataset.split: if config_dataset.split:
load_ds_kwargs = {"split": config_dataset.split} load_ds_kwargs["split"] = config_dataset.split
ds = load_dataset( ds = load_dataset(
config_dataset.path, config_dataset.path,
name=config_dataset.name, name=config_dataset.name,
@@ -374,6 +384,7 @@ def load_tokenized_prepared_datasets(
data_files=config_dataset.data_files, data_files=config_dataset.data_files,
token=use_auth_token, token=use_auth_token,
revision=config_dataset.revision, revision=config_dataset.revision,
trust_remote_code=config_dataset.trust_remote_code,
**load_ds_kwargs, **load_ds_kwargs,
) )
elif ds_from_cloud and remote_file_system: elif ds_from_cloud and remote_file_system:
@@ -391,6 +402,7 @@ def load_tokenized_prepared_datasets(
streaming=False, streaming=False,
split=None, split=None,
storage_options=storage_options, storage_options=storage_options,
trust_remote_code=config_dataset.trust_remote_code,
) )
elif config_dataset.path.startswith("https://"): elif config_dataset.path.startswith("https://"):
ds_type = get_ds_type(config_dataset) ds_type = get_ds_type(config_dataset)
@@ -401,6 +413,7 @@ def load_tokenized_prepared_datasets(
streaming=False, streaming=False,
split=None, split=None,
storage_options=storage_options, storage_options=storage_options,
trust_remote_code=config_dataset.trust_remote_code,
) )
else: else:
if isinstance(config_dataset.data_files, str): if isinstance(config_dataset.data_files, str):

View File

@@ -238,6 +238,7 @@ def load_tokenizer(cfg):
x in cfg.lora_modules_to_save for x in lora_modules_to_save x in cfg.lora_modules_to_save for x in lora_modules_to_save
) )
) )
and k != "pad_token"
): ):
lora_modules_to_save = ", ".join( lora_modules_to_save = ", ".join(
[f"`{x}`" for x in lora_modules_to_save] [f"`{x}`" for x in lora_modules_to_save]
@@ -394,10 +395,17 @@ class ModelLoader:
and self.cfg.flash_attention and self.cfg.flash_attention
and self.cfg.sample_packing and self.cfg.sample_packing
): ):
has_remote_code = (
"auto_map" in self.model_config
and "AutoModelForCausalLM" in self.model_config["auto_map"]
)
if has_remote_code and self.cfg.trust_remote_code is False:
# if explicitly set in the YAML, we should prefer that, for example if explicitly disabled
has_remote_code = self.cfg.trust_remote_code
patch_for_multipack( patch_for_multipack(
self.cfg.model_config_type, self.cfg.model_config_type,
model_name=self.cfg.base_model, model_name=self.cfg.base_model,
is_remote_code=self.cfg.trust_remote_code, has_remote_code=has_remote_code,
) )
if self.cfg.is_llama_derived_model: if self.cfg.is_llama_derived_model:

View File

@@ -66,28 +66,47 @@ def process_tokens_for_rl_debug(tokens, color, tokenizer, text_only):
def check_rl_example_labels(example, tokenizer, text_only=False): def check_rl_example_labels(example, tokenizer, text_only=False):
field_prompt, field_chosen, field_rejected = "prompt", "chosen", "rejected" field_prompt, field_chosen, field_rejected, field_completion = (
"prompt",
"chosen",
"rejected",
"completion",
)
input_tokens = example[field_prompt] input_tokens = example[field_prompt]
labels_chosen, labels_rejected = example[field_chosen], example[field_rejected]
labels_chosen = example.get(field_chosen)
labels_rejected = example.get(field_rejected)
labels_completion = example.get(field_completion)
# Create a delimiter based on text_only flag
delimiter = "" if text_only else " "
# Process and color each type of token # Process and color each type of token
colored_tokens = process_tokens_for_rl_debug( colored_tokens = process_tokens_for_rl_debug(
input_tokens, "yellow", tokenizer, text_only input_tokens, "yellow", tokenizer, text_only
) )
colored_chosens = process_tokens_for_rl_debug(
labels_chosen, "green", tokenizer, text_only
)
colored_rejecteds = process_tokens_for_rl_debug(
labels_rejected, "red", tokenizer, text_only
)
# Create a delimiter based on text_only flag # Process tokens
delimiter = "" if text_only else " " if labels_completion is None:
colored_chosens = process_tokens_for_rl_debug(
labels_chosen, "green", tokenizer, text_only
)
colored_rejecteds = process_tokens_for_rl_debug(
labels_rejected, "red", tokenizer, text_only
)
else:
colored_completion = process_tokens_for_rl_debug(
labels_completion, "green", tokenizer, text_only
)
# Logging information # Logging information
LOG.info(f"INPUT PROMPT: {delimiter.join(colored_tokens)}\n\n") LOG.info(f"INPUT PROMPT: {delimiter.join(colored_tokens)}\n\n")
LOG.info(f"CHOSEN RESPONSE: {delimiter.join(colored_chosens)}\n\n")
LOG.info(f"REJECTED RESPONSE: {delimiter.join(colored_rejecteds)}\n\n\n") if labels_completion is None:
LOG.info(f"CHOSEN RESPONSE: {delimiter.join(colored_chosens)}\n\n")
LOG.info(f"REJECTED RESPONSE: {delimiter.join(colored_rejecteds)}\n\n\n")
else:
LOG.info(f"COMPLETION RESPONSE: {delimiter.join(colored_completion)}\n\n\n")
return delimiter.join(colored_tokens) return delimiter.join(colored_tokens)

View File

@@ -203,37 +203,59 @@ def process_datasets_for_packing(cfg, train_dataset, eval_dataset):
if eval_dataset and "token_type_ids" in eval_dataset.column_names: if eval_dataset and "token_type_ids" in eval_dataset.column_names:
eval_dataset = eval_dataset.remove_columns("token_type_ids") eval_dataset = eval_dataset.remove_columns("token_type_ids")
prior_len = len(train_dataset)
train_dataset = train_dataset.filter( train_dataset = train_dataset.filter(
drop_long, drop_long,
num_proc=cfg.dataset_processes, num_proc=cfg.dataset_processes,
load_from_cache_file=not cfg.is_preprocess, load_from_cache_file=not cfg.is_preprocess,
desc="Dropping Long Sequences", desc="Dropping Long Sequences",
) )
dropped = prior_len - len(train_dataset)
if dropped:
LOG.warning(f"Dropped {dropped} long samples from train dataset")
if eval_dataset: if eval_dataset:
prior_len = len(eval_dataset)
eval_dataset = eval_dataset.filter( eval_dataset = eval_dataset.filter(
drop_long, drop_long,
num_proc=cfg.dataset_processes, num_proc=cfg.dataset_processes,
load_from_cache_file=not cfg.is_preprocess, load_from_cache_file=not cfg.is_preprocess,
desc="Dropping Long Sequences", desc="Dropping Long Sequences",
) )
dropped = prior_len - len(eval_dataset)
if dropped:
LOG.warning(f"Dropped {dropped} long samples from eval dataset")
# drop samples with where the number of elements with labels not equal to -100 is zero # drop samples with where the number of elements with labels not equal to -100 is zero
def drop_no_trainable_tokens(sample): def drop_no_trainable_tokens(sample):
return np.sum(np.array(sample["labels"]) != -100) > 0 return np.sum(np.array(sample["labels"]) != -100) > 0
prior_len = len(train_dataset)
train_dataset = train_dataset.filter( train_dataset = train_dataset.filter(
drop_no_trainable_tokens, drop_no_trainable_tokens,
num_proc=cfg.dataset_processes, num_proc=cfg.dataset_processes,
load_from_cache_file=not cfg.is_preprocess, load_from_cache_file=not cfg.is_preprocess,
desc="Drop Samples with Zero Trainable Tokens", desc="Drop Samples with Zero Trainable Tokens",
) )
dropped = prior_len - len(train_dataset)
if dropped:
LOG.warning(
f"Dropped {dropped} samples with no trainable tokens from train dataset"
)
if eval_dataset: if eval_dataset:
prior_len = len(eval_dataset)
eval_dataset = eval_dataset.filter( eval_dataset = eval_dataset.filter(
drop_no_trainable_tokens, drop_no_trainable_tokens,
num_proc=cfg.dataset_processes, num_proc=cfg.dataset_processes,
load_from_cache_file=not cfg.is_preprocess, load_from_cache_file=not cfg.is_preprocess,
desc="Drop Samples with Zero Trainable Tokens", desc="Drop Samples with Zero Trainable Tokens",
) )
dropped = prior_len - len(eval_dataset)
if dropped:
LOG.warning(
f"Dropped {dropped} samples with no trainable tokens from eval dataset"
)
if cfg.group_by_length: if cfg.group_by_length:
train_dataset = train_dataset.map( train_dataset = train_dataset.map(
@@ -493,7 +515,7 @@ def prepare_opinionated_env(cfg):
def setup_trainer( def setup_trainer(
cfg, train_dataset, eval_dataset, model, tokenizer, processor, total_num_steps cfg, train_dataset, eval_dataset, model, tokenizer, processor, total_num_steps
): ):
if cfg.rl in ["dpo", "ipo", "orpo", "kto", "simpo"]: if cfg.rl in ("dpo", "ipo", "orpo", "kto", "simpo"):
trainer_builder = HFRLTrainerBuilder(cfg, model[0], tokenizer, processor) trainer_builder = HFRLTrainerBuilder(cfg, model[0], tokenizer, processor)
trainer_builder.model_ref = model[1] trainer_builder.model_ref = model[1]
trainer_builder.peft_config = model[2] trainer_builder.peft_config = model[2]

35
tests/e2e/conftest.py Normal file
View File

@@ -0,0 +1,35 @@
"""
shared pytest fixtures
"""
import shutil
import tempfile
import pytest
from huggingface_hub import snapshot_download
@pytest.fixture(scope="session", autouse=True)
def download_smollm2_135m_model():
# download the model
snapshot_download("HuggingFaceTB/SmolLM2-135M")
@pytest.fixture(scope="session", autouse=True)
def download_tatsu_lab_alpaca_dataset():
# download the model
snapshot_download("tatsu-lab/alpaca", repo_type="dataset")
@pytest.fixture(scope="session", autouse=True)
def download_mhenrichsen_alpaca_2k_dataset():
# download the model
snapshot_download("mhenrichsen/alpaca_2k_test", repo_type="dataset")
@pytest.fixture
def temp_dir():
# Create a temporary directory
_temp_dir = tempfile.mkdtemp()
yield _temp_dir
# Clean up the directory after the test
shutil.rmtree(_temp_dir)

View File

@@ -3,28 +3,25 @@ E2E tests for multigpu eval
""" """
import logging import logging
import os import os
import unittest
from pathlib import Path from pathlib import Path
import yaml import yaml
from accelerate.test_utils import execute_subprocess_async from accelerate.test_utils import execute_subprocess_async
from transformers.testing_utils import get_torch_dist_unique_port
from axolotl.utils.dict import DictDefault from axolotl.utils.dict import DictDefault
from ..utils import with_temp_dir
LOG = logging.getLogger("axolotl.tests.e2e.multigpu") LOG = logging.getLogger("axolotl.tests.e2e.multigpu")
os.environ["WANDB_DISABLED"] = "true" os.environ["WANDB_DISABLED"] = "true"
AXOLOTL_ROOT = Path(__file__).parent.parent.parent.parent AXOLOTL_ROOT = Path(__file__).parent.parent.parent.parent
class TestMultiGPUEval(unittest.TestCase): class TestMultiGPUEval:
""" """
Test case for MultiGPU Eval Sample Packing Test case for MultiGPU Eval Sample Packing
""" """
@with_temp_dir
def test_eval_sample_packing(self, temp_dir): def test_eval_sample_packing(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
@@ -83,13 +80,14 @@ class TestMultiGPUEval(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir
def test_eval(self, temp_dir): def test_eval(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
@@ -148,6 +146,8 @@ class TestMultiGPUEval(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),

View File

@@ -4,17 +4,17 @@ E2E tests for multigpu lora tinyllama
import logging import logging
import os import os
import unittest
from pathlib import Path from pathlib import Path
import pytest import pytest
import yaml import yaml
from accelerate.test_utils import execute_subprocess_async from accelerate.test_utils import execute_subprocess_async
from huggingface_hub import snapshot_download from huggingface_hub import snapshot_download
from transformers.testing_utils import get_torch_dist_unique_port
from axolotl.utils.dict import DictDefault from axolotl.utils.dict import DictDefault
from ..utils import is_hopper, with_temp_dir from ..utils import is_hopper
LOG = logging.getLogger("axolotl.tests.e2e.multigpu") LOG = logging.getLogger("axolotl.tests.e2e.multigpu")
os.environ["WANDB_DISABLED"] = "true" os.environ["WANDB_DISABLED"] = "true"
@@ -25,21 +25,19 @@ AXOLOTL_ROOT = Path(__file__).parent.parent.parent.parent
@pytest.fixture(scope="session", autouse=True) @pytest.fixture(scope="session", autouse=True)
def download_model(): def download_model():
# download the model # download the model
snapshot_download("TinyLlama/TinyLlama_v1.1") snapshot_download("HuggingFaceTB/SmolLM2-135M")
class TestMultiGPULlama(unittest.TestCase): class TestMultiGPULlama:
""" """
Test case for Llama models using LoRA Test case for Llama models using LoRA
""" """
@with_temp_dir
def test_lora_ddp(self, temp_dir): def test_lora_ddp(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sequence_len": 2048, "sequence_len": 2048,
"adapter": "lora", "adapter": "lora",
"lora_r": 8, "lora_r": 8,
@@ -48,9 +46,7 @@ class TestMultiGPULlama(unittest.TestCase):
"lora_target_linear": True, "lora_target_linear": True,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -81,19 +77,23 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir @pytest.mark.parametrize(
def test_lora_ddp_packed(self, temp_dir): "gradient_accumulation_steps",
[1, 4],
)
def test_lora_ddp_packed(self, temp_dir, gradient_accumulation_steps):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sequence_len": 2048, "sequence_len": 2048,
"sample_packing": True, "sample_packing": True,
"eval_sample_packing": False, "eval_sample_packing": False,
@@ -105,9 +105,7 @@ class TestMultiGPULlama(unittest.TestCase):
"lora_target_linear": True, "lora_target_linear": True,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -118,7 +116,7 @@ class TestMultiGPULlama(unittest.TestCase):
"num_epochs": 1, "num_epochs": 1,
"max_steps": 15, "max_steps": 15,
"micro_batch_size": 4, "micro_batch_size": 4,
"gradient_accumulation_steps": 4, "gradient_accumulation_steps": gradient_accumulation_steps,
"output_dir": temp_dir, "output_dir": temp_dir,
"learning_rate": 0.00001, "learning_rate": 0.00001,
"optimizer": "adamw_8bit", "optimizer": "adamw_8bit",
@@ -138,6 +136,8 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
@@ -145,13 +145,11 @@ class TestMultiGPULlama(unittest.TestCase):
) )
@pytest.mark.skipif(is_hopper(), reason="h100 doesn't support 8-bit lora") @pytest.mark.skipif(is_hopper(), reason="h100 doesn't support 8-bit lora")
@with_temp_dir
def test_dpo_lora_ddp(self, temp_dir): def test_dpo_lora_ddp(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sequence_len": 2048, "sequence_len": 2048,
"sample_packing": False, "sample_packing": False,
"eval_sample_packing": False, "eval_sample_packing": False,
@@ -164,12 +162,10 @@ class TestMultiGPULlama(unittest.TestCase):
"lora_target_linear": True, "lora_target_linear": True,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"rl": "dpo", "rl": "dpo",
"chat_template": "llama3", "chat_template": "chatml",
"datasets": [ "datasets": [
{ {
"path": "fozziethebeat/alpaca_messages_2k_dpo_test", "path": "fozziethebeat/alpaca_messages_2k_dpo_test",
@@ -210,18 +206,19 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir
def test_dpo_qlora_ddp(self, temp_dir): def test_dpo_qlora_ddp(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "HuggingFaceTB/SmolLM-135M", "base_model": "HuggingFaceTB/SmolLM2-135M",
"sequence_len": 2048, "sequence_len": 2048,
"sample_packing": False, "sample_packing": False,
"eval_sample_packing": False, "eval_sample_packing": False,
@@ -278,25 +275,27 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir @pytest.mark.parametrize(
def test_fsdp(self, temp_dir): "gradient_accumulation_steps",
[1, 4],
)
def test_fsdp(self, temp_dir, gradient_accumulation_steps):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sequence_len": 2048, "sequence_len": 2048,
"val_set_size": 0.05, "val_set_size": 0.01,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -305,9 +304,9 @@ class TestMultiGPULlama(unittest.TestCase):
}, },
], ],
"num_epochs": 1, "num_epochs": 1,
"max_steps": 15, "max_steps": 10,
"micro_batch_size": 4, "micro_batch_size": 4,
"gradient_accumulation_steps": 4, "gradient_accumulation_steps": gradient_accumulation_steps,
"output_dir": temp_dir, "output_dir": temp_dir,
"learning_rate": 0.00001, "learning_rate": 0.00001,
"optimizer": "adamw_torch", "optimizer": "adamw_torch",
@@ -324,7 +323,7 @@ class TestMultiGPULlama(unittest.TestCase):
"fsdp_use_orig_params": False, "fsdp_use_orig_params": False,
"fsdp_cpu_ram_efficient_loading": False, "fsdp_cpu_ram_efficient_loading": False,
"fsdp_transformer_layer_cls_to_wrap": "LlamaDecoderLayer", "fsdp_transformer_layer_cls_to_wrap": "LlamaDecoderLayer",
"fsdp_state_dict_type": "SHARDED_STATE_DICT", "fsdp_state_dict_type": "FULL_STATE_DICT",
"fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP", "fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP",
}, },
} }
@@ -341,28 +340,29 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir @pytest.mark.parametrize(
def test_fsdp_packed(self, temp_dir): "fsdp_state_dict_type",
["FULL_STATE_DICT", "SHARDED_STATE_DICT"],
)
def test_fsdp_packed(self, temp_dir, fsdp_state_dict_type):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sample_packing": True, "sample_packing": True,
"eval_sample_packing": False,
"pad_to_sequence_len": True, "pad_to_sequence_len": True,
"sequence_len": 2048, "sequence_len": 2048,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -390,7 +390,7 @@ class TestMultiGPULlama(unittest.TestCase):
"fsdp_use_orig_params": False, "fsdp_use_orig_params": False,
"fsdp_cpu_ram_efficient_loading": False, "fsdp_cpu_ram_efficient_loading": False,
"fsdp_transformer_layer_cls_to_wrap": "LlamaDecoderLayer", "fsdp_transformer_layer_cls_to_wrap": "LlamaDecoderLayer",
"fsdp_state_dict_type": "SHARDED_STATE_DICT", "fsdp_state_dict_type": fsdp_state_dict_type,
"fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP", "fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP",
}, },
} }
@@ -407,19 +407,19 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir
def test_fsdp_qlora_prequant_packed(self, temp_dir): def test_fsdp_qlora_prequant_packed(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "axolotl-ai-co/TinyLlama_v1.1-bnb-nf4-bf16", "base_model": "axolotl-ai-co/SmolLM2-135M-bnb-nf4-bf16",
"tokenizer_type": "AutoTokenizer",
"adapter": "qlora", "adapter": "qlora",
"mean_resizing_embeddings": True, "mean_resizing_embeddings": True,
"load_in_4bit": True, "load_in_4bit": True,
@@ -427,17 +427,17 @@ class TestMultiGPULlama(unittest.TestCase):
"lora_alpha": 16, "lora_alpha": 16,
"lora_dropout": 0.05, "lora_dropout": 0.05,
"lora_target_linear": True, "lora_target_linear": True,
"lora_modules_to_save": [ # "lora_modules_to_save": [
"embed_tokens", # "embed_tokens",
"lm_head", # "lm_head",
], # ],
"sample_packing": True, "sample_packing": True,
"eval_sample_packing": False, "eval_sample_packing": False,
"pad_to_sequence_len": True, "pad_to_sequence_len": True,
"sequence_len": 2048, "sequence_len": 2048,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"pad_token": "</s>", "pad_token": "<|endoftext|>",
}, },
"datasets": [ "datasets": [
{ {
@@ -483,28 +483,29 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir @pytest.mark.parametrize(
def test_ds_zero3_packed(self, temp_dir): "gradient_accumulation_steps",
[1, 4],
)
def test_ds_zero3_packed(self, temp_dir, gradient_accumulation_steps):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"sample_packing": True, "sample_packing": True,
"eval_sample_packing": False,
"pad_to_sequence_len": True, "pad_to_sequence_len": True,
"sequence_len": 2048, "sequence_len": 2048,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -515,7 +516,7 @@ class TestMultiGPULlama(unittest.TestCase):
"num_epochs": 1, "num_epochs": 1,
"max_steps": 15, "max_steps": 15,
"micro_batch_size": 4, "micro_batch_size": 4,
"gradient_accumulation_steps": 4, "gradient_accumulation_steps": gradient_accumulation_steps,
"output_dir": temp_dir, "output_dir": temp_dir,
"learning_rate": 0.00001, "learning_rate": 0.00001,
"optimizer": "adamw_torch", "optimizer": "adamw_torch",
@@ -536,19 +537,19 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),
] ]
) )
@with_temp_dir
def test_ds_zero3_qlora_packed(self, temp_dir): def test_ds_zero3_qlora_packed(self, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "TinyLlama/TinyLlama_v1.1", "base_model": "HuggingFaceTB/SmolLM2-135M",
"tokenizer_type": "LlamaTokenizer",
"load_in_4bit": True, "load_in_4bit": True,
"adapter": "qlora", "adapter": "qlora",
"lora_r": 8, "lora_r": 8,
@@ -561,9 +562,7 @@ class TestMultiGPULlama(unittest.TestCase):
"sequence_len": 2048, "sequence_len": 2048,
"val_set_size": 0.05, "val_set_size": 0.05,
"special_tokens": { "special_tokens": {
"unk_token": "<unk>", "pad_token": "<|endoftext|>",
"bos_token": "<s>",
"eos_token": "</s>",
}, },
"datasets": [ "datasets": [
{ {
@@ -595,6 +594,8 @@ class TestMultiGPULlama(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),

View File

@@ -4,31 +4,30 @@ E2E tests for multigpu qwen2
import logging import logging
import os import os
import unittest
from pathlib import Path from pathlib import Path
import pytest
import yaml import yaml
from accelerate.test_utils import execute_subprocess_async from accelerate.test_utils import execute_subprocess_async
from transformers.testing_utils import get_torch_dist_unique_port
from axolotl.utils.dict import DictDefault from axolotl.utils.dict import DictDefault
from ..utils import with_temp_dir
LOG = logging.getLogger("axolotl.tests.e2e.multigpu") LOG = logging.getLogger("axolotl.tests.e2e.multigpu")
os.environ["WANDB_DISABLED"] = "true" os.environ["WANDB_DISABLED"] = "true"
class TestMultiGPUQwen2(unittest.TestCase): class TestMultiGPUQwen2:
""" """
Test case for Llama models using LoRA Test case for Llama models using LoRA
""" """
@with_temp_dir @pytest.mark.parametrize("base_model", ["Qwen/Qwen2-0.5B", "Qwen/Qwen2.5-0.5B"])
def test_qlora_fsdp_dpo(self, temp_dir): def test_qlora_fsdp_dpo(self, base_model, temp_dir):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "Qwen/Qwen2-1.5B", "base_model": base_model,
"load_in_4bit": True, "load_in_4bit": True,
"rl": "dpo", "rl": "dpo",
"chat_template": "chatml", "chat_template": "chatml",
@@ -47,9 +46,9 @@ class TestMultiGPUQwen2(unittest.TestCase):
}, },
], ],
"num_epochs": 1, "num_epochs": 1,
"max_steps": 15, "max_steps": 5,
"warmup_steps": 20, "warmup_steps": 20,
"micro_batch_size": 4, "micro_batch_size": 2,
"gradient_accumulation_steps": 2, "gradient_accumulation_steps": 2,
"output_dir": temp_dir, "output_dir": temp_dir,
"learning_rate": 0.00001, "learning_rate": 0.00001,
@@ -91,6 +90,8 @@ class TestMultiGPUQwen2(unittest.TestCase):
"launch", "launch",
"--num-processes", "--num-processes",
"2", "2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m", "-m",
"axolotl.cli.train", "axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"), str(Path(temp_dir) / "config.yaml"),

View File

@@ -66,6 +66,8 @@ class TestFAXentropyLlama(unittest.TestCase):
}, },
], ],
"num_epochs": 1, "num_epochs": 1,
"max_steps": 10,
"save_steps": 10,
"micro_batch_size": 8, "micro_batch_size": 8,
"gradient_accumulation_steps": 1, "gradient_accumulation_steps": 1,
"output_dir": temp_dir, "output_dir": temp_dir,

View File

@@ -56,6 +56,8 @@ class TestLoraLlama(unittest.TestCase):
}, },
], ],
"num_epochs": 2, "num_epochs": 2,
"max_steps": 20,
"save_steps": 10,
"micro_batch_size": 8, "micro_batch_size": 8,
"gradient_accumulation_steps": 1, "gradient_accumulation_steps": 1,
"output_dir": temp_dir, "output_dir": temp_dir,
@@ -109,6 +111,7 @@ class TestLoraLlama(unittest.TestCase):
}, },
], ],
"num_epochs": 2, "num_epochs": 2,
"max_steps": 20,
"save_steps": 0.5, "save_steps": 0.5,
"micro_batch_size": 8, "micro_batch_size": 8,
"gradient_accumulation_steps": 1, "gradient_accumulation_steps": 1,

66
tests/e2e/test_llama.py Normal file
View File

@@ -0,0 +1,66 @@
"""
E2E tests for llama
"""
import logging
import os
import unittest
from pathlib import Path
from axolotl.cli import load_datasets
from axolotl.common.cli import TrainerCliArgs
from axolotl.train import train
from axolotl.utils.config import normalize_config
from axolotl.utils.dict import DictDefault
from .utils import with_temp_dir
LOG = logging.getLogger("axolotl.tests.e2e")
os.environ["WANDB_DISABLED"] = "true"
class TestLlama(unittest.TestCase):
"""
Test case for Llama models
"""
@with_temp_dir
def test_fft_trust_remote_code(self, temp_dir):
# pylint: disable=duplicate-code
cfg = DictDefault(
{
"base_model": "JackFram/llama-68m",
"tokenizer_type": "LlamaTokenizer",
"trust_remote_code": True,
"sequence_len": 512,
"val_set_size": 0.1,
"special_tokens": {
"unk_token": "<unk>",
"bos_token": "<s>",
"eos_token": "</s>",
},
"datasets": [
{
"path": "mhenrichsen/alpaca_2k_test",
"type": "alpaca",
},
],
"num_epochs": 1,
"micro_batch_size": 8,
"gradient_accumulation_steps": 1,
"output_dir": temp_dir,
"learning_rate": 0.00001,
"optimizer": "adamw_bnb_8bit",
"lr_scheduler": "cosine",
"flash_attention": True,
"sample_packing": True,
"bf16": True,
"save_safetensors": True,
}
)
normalize_config(cfg)
cli_args = TrainerCliArgs()
dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
assert (Path(temp_dir) / "model.safetensors").exists()

View File

@@ -108,3 +108,37 @@ class TestCustomOptimizers(unittest.TestCase):
train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta) train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
assert (Path(temp_dir) / "adapter_model.bin").exists() assert (Path(temp_dir) / "adapter_model.bin").exists()
@with_temp_dir
def test_fft_schedule_free_adamw(self, temp_dir):
cfg = DictDefault(
{
"base_model": "HuggingFaceTB/SmolLM2-135M",
"sequence_len": 1024,
"val_set_size": 0.1,
"special_tokens": {
"pad_token": "<|endoftext|>",
},
"datasets": [
{
"path": "mhenrichsen/alpaca_2k_test",
"type": "alpaca",
},
],
"num_epochs": 1,
"micro_batch_size": 4,
"gradient_accumulation_steps": 2,
"output_dir": temp_dir,
"learning_rate": 0.00001,
"optimizer": "schedule_free_adamw",
"lr_scheduler": "constant",
"save_safetensors": True,
}
)
# pylint: disable=duplicate-code
normalize_config(cfg)
cli_args = TrainerCliArgs()
dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
assert (Path(temp_dir) / "model.safetensors").exists()

View File

@@ -31,7 +31,7 @@ class TestPackedLlama(unittest.TestCase):
# pylint: disable=duplicate-code # pylint: disable=duplicate-code
cfg = DictDefault( cfg = DictDefault(
{ {
"base_model": "HuggingFaceTB/SmolLM-135M", "base_model": "HuggingFaceTB/SmolLM2-135M",
"sequence_len": 1024, "sequence_len": 1024,
"sample_packing": True, "sample_packing": True,
"flash_attention": True, "flash_attention": True,

85
tests/e2e/test_qwen.py Normal file
View File

@@ -0,0 +1,85 @@
"""
E2E tests for qwen
"""
import logging
import os
from pathlib import Path
import pytest
import yaml
from accelerate.test_utils import execute_subprocess_async
from transformers.testing_utils import get_torch_dist_unique_port
from axolotl.utils.dict import DictDefault
LOG = logging.getLogger("axolotl.tests.qwen")
os.environ["WANDB_DISABLED"] = "true"
class TestE2eQwen:
"""
Test cases for qwen models
"""
@pytest.mark.parametrize("base_model", ["Qwen/Qwen2-0.5B", "Qwen/Qwen2.5-0.5B"])
def test_dpo(self, base_model, temp_dir):
# pylint: disable=duplicate-code
cfg = DictDefault(
{
"base_model": base_model,
"rl": "dpo",
"chat_template": "qwen_25",
"sequence_len": 2048,
"val_set_size": 0.0,
"datasets": [
{
"path": "fozziethebeat/alpaca_messages_2k_dpo_test",
"split": "train",
"type": "chat_template.default",
"field_messages": "conversation",
"field_chosen": "chosen",
"field_rejected": "rejected",
"message_field_role": "role",
"message_field_content": "content",
"roles": {
"system": ["system"],
"user": ["user"],
"assistant": ["assistant"],
},
},
],
"num_epochs": 1,
"max_steps": 5,
"warmup_steps": 20,
"micro_batch_size": 2,
"gradient_accumulation_steps": 2,
"output_dir": temp_dir,
"learning_rate": 0.00001,
"optimizer": "adamw_bnb_8bit",
"lr_scheduler": "cosine",
"flash_attention": True,
"bf16": "auto",
"tf32": True,
"gradient_checkpointing": True,
}
)
# write cfg to yaml file
Path(temp_dir).mkdir(parents=True, exist_ok=True)
with open(Path(temp_dir) / "config.yaml", "w", encoding="utf-8") as fout:
fout.write(yaml.dump(cfg.to_dict(), Dumper=yaml.Dumper))
execute_subprocess_async(
[
"accelerate",
"launch",
"--num-processes",
"2",
"--main_process_port",
f"{get_torch_dist_unique_port()}",
"-m",
"axolotl.cli.train",
str(Path(temp_dir) / "config.yaml"),
]
)

View File

@@ -371,44 +371,79 @@ class TestDatasetPreparation(unittest.TestCase):
def test_load_local_hub_with_revision(self): def test_load_local_hub_with_revision(self):
"""Verify that a local copy of a hub dataset can be loaded with a specific revision""" """Verify that a local copy of a hub dataset can be loaded with a specific revision"""
with tempfile.TemporaryDirectory() as tmp_dir: with tempfile.TemporaryDirectory() as tmp_dir:
with tempfile.TemporaryDirectory() as tmp_dir2: tmp_ds_path = Path(tmp_dir) / "mhenrichsen/alpaca_2k_test"
tmp_ds_path = Path(tmp_dir2) / "mhenrichsen/alpaca_2k_test" tmp_ds_path.mkdir(parents=True, exist_ok=True)
tmp_ds_path.mkdir(parents=True, exist_ok=True) snapshot_download(
snapshot_download( repo_id="mhenrichsen/alpaca_2k_test",
repo_id="mhenrichsen/alpaca_2k_test", repo_type="dataset",
repo_type="dataset", local_dir=tmp_ds_path,
local_dir=tmp_ds_path, revision="d05c1cb",
revision="d05c1cb", )
)
prepared_path = Path(tmp_dir) / "prepared" prepared_path = Path(tmp_dir) / "prepared"
cfg = DictDefault( cfg = DictDefault(
{ {
"tokenizer_config": "huggyllama/llama-7b", "tokenizer_config": "huggyllama/llama-7b",
"sequence_len": 1024, "sequence_len": 1024,
"datasets": [ "datasets": [
{ {
"path": "mhenrichsen/alpaca_2k_test", "path": "mhenrichsen/alpaca_2k_test",
"ds_type": "parquet", "ds_type": "parquet",
"type": "alpaca", "type": "alpaca",
"data_files": [ "data_files": [
f"{tmp_ds_path}/alpaca_2000.parquet", f"{tmp_ds_path}/alpaca_2000.parquet",
], ],
"revision": "d05c1cb", "revision": "d05c1cb",
}, },
], ],
} }
) )
dataset, _ = load_tokenized_prepared_datasets( dataset, _ = load_tokenized_prepared_datasets(
self.tokenizer, cfg, prepared_path self.tokenizer, cfg, prepared_path
) )
assert len(dataset) == 2000 assert len(dataset) == 2000
assert "input_ids" in dataset.features assert "input_ids" in dataset.features
assert "attention_mask" in dataset.features assert "attention_mask" in dataset.features
assert "labels" in dataset.features assert "labels" in dataset.features
shutil.rmtree(tmp_ds_path) shutil.rmtree(tmp_ds_path)
def test_loading_local_dataset_folder(self):
"""Verify that a dataset downloaded to a local folder can be loaded"""
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_ds_path = Path(tmp_dir) / "mhenrichsen/alpaca_2k_test"
tmp_ds_path.mkdir(parents=True, exist_ok=True)
snapshot_download(
repo_id="mhenrichsen/alpaca_2k_test",
repo_type="dataset",
local_dir=tmp_ds_path,
)
prepared_path = Path(tmp_dir) / "prepared"
cfg = DictDefault(
{
"tokenizer_config": "huggyllama/llama-7b",
"sequence_len": 1024,
"datasets": [
{
"path": str(tmp_ds_path),
"type": "alpaca",
},
],
}
)
dataset, _ = load_tokenized_prepared_datasets(
self.tokenizer, cfg, prepared_path
)
assert len(dataset) == 2000
assert "input_ids" in dataset.features
assert "attention_mask" in dataset.features
assert "labels" in dataset.features
shutil.rmtree(tmp_ds_path)
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -32,16 +32,19 @@ class TestCosineConstantLr(unittest.TestCase):
def test_schedulers(self): def test_schedulers(self):
self.assertEqual(self.lr_scheduler.get_last_lr()[0], 0) self.assertEqual(self.lr_scheduler.get_last_lr()[0], 0)
for _ in range(self.warmup_steps): for _ in range(self.warmup_steps):
self.optimizer.step()
self.lr_scheduler.step() self.lr_scheduler.step()
self.assertEqual(self.lr_scheduler.get_last_lr()[0], self._lr) self.assertEqual(self.lr_scheduler.get_last_lr()[0], self._lr)
constant_step = int(self.train_steps * self.constant_lr_ratio) constant_step = int(self.train_steps * self.constant_lr_ratio)
remaining_step = self.train_steps - constant_step remaining_step = self.train_steps - constant_step
for _ in range(constant_step): for _ in range(constant_step):
self.optimizer.step()
self.lr_scheduler.step() self.lr_scheduler.step()
self.assertEqual( self.assertEqual(
self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio
) )
for _ in range(remaining_step): for _ in range(remaining_step):
self.optimizer.step()
self.lr_scheduler.step() self.lr_scheduler.step()
self.assertEqual( self.assertEqual(
self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio

View File

@@ -726,7 +726,7 @@ class TestValidation(BaseValidation):
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "epoch", "eval_strategy": "epoch",
"eval_steps": 10, "eval_steps": 10,
} }
) )
@@ -734,14 +734,14 @@ class TestValidation(BaseValidation):
) )
with pytest.raises( with pytest.raises(
ValueError, match=r".*evaluation_strategy and eval_steps mismatch.*" ValueError, match=r".*eval_strategy and eval_steps mismatch.*"
): ):
validate_config(cfg) validate_config(cfg)
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "no", "eval_strategy": "no",
"eval_steps": 10, "eval_steps": 10,
} }
) )
@@ -749,14 +749,14 @@ class TestValidation(BaseValidation):
) )
with pytest.raises( with pytest.raises(
ValueError, match=r".*evaluation_strategy and eval_steps mismatch.*" ValueError, match=r".*eval_strategy and eval_steps mismatch.*"
): ):
validate_config(cfg) validate_config(cfg)
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "steps", "eval_strategy": "steps",
} }
) )
| minimal_cfg | minimal_cfg
@@ -767,7 +767,7 @@ class TestValidation(BaseValidation):
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "steps", "eval_strategy": "steps",
"eval_steps": 10, "eval_steps": 10,
} }
) )
@@ -790,7 +790,7 @@ class TestValidation(BaseValidation):
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "no", "eval_strategy": "no",
} }
) )
| minimal_cfg | minimal_cfg
@@ -801,7 +801,7 @@ class TestValidation(BaseValidation):
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "epoch", "eval_strategy": "epoch",
"val_set_size": 0, "val_set_size": 0,
} }
) )
@@ -810,7 +810,7 @@ class TestValidation(BaseValidation):
with pytest.raises( with pytest.raises(
ValueError, ValueError,
match=r".*eval_steps and evaluation_strategy are not supported with val_set_size == 0.*", match=r".*eval_steps and eval_strategy are not supported with val_set_size == 0.*",
): ):
validate_config(cfg) validate_config(cfg)
@@ -826,7 +826,7 @@ class TestValidation(BaseValidation):
with pytest.raises( with pytest.raises(
ValueError, ValueError,
match=r".*eval_steps and evaluation_strategy are not supported with val_set_size == 0.*", match=r".*eval_steps and eval_strategy are not supported with val_set_size == 0.*",
): ):
validate_config(cfg) validate_config(cfg)
@@ -856,7 +856,7 @@ class TestValidation(BaseValidation):
cfg = ( cfg = (
DictDefault( DictDefault(
{ {
"evaluation_strategy": "epoch", "eval_strategy": "epoch",
"val_set_size": 0.01, "val_set_size": 0.01,
} }
) )
@@ -1095,6 +1095,24 @@ class TestValidation(BaseValidation):
assert new_cfg["dpo_beta"] is None assert new_cfg["dpo_beta"] is None
assert len(self._caplog.records) == 1 assert len(self._caplog.records) == 1
def test_eval_strategy_remap(self, minimal_cfg):
cfg = (
DictDefault(
{
"evaluation_strategy": "steps",
}
)
| minimal_cfg
)
with self._caplog.at_level(logging.WARNING):
new_cfg = validate_config(cfg)
assert new_cfg.eval_strategy == "steps"
assert (
"evaluation_strategy is deprecated, use eval_strategy instead"
in self._caplog.records[0].message
)
class TestValidationCheckModelConfig(BaseValidation): class TestValidationCheckModelConfig(BaseValidation):
""" """