Wing Lian b6fc46ada8 Updates for trl 0.16.0 - mostly for GRPO (#2437) [skip ci]
* add grpo scale_rewards config for trl#3135

* options to connect to vllm server directly w grpo trl#3094

* temperature support trl#3029

* sampling/generation kwargs for grpo trl#2989

* make vllm_enable_prefix_caching a config param trl#2900

* grpo multi-step optimizeations trl#2899

* remove overrides for grpo trainer

* bump trl to 0.16.0

* add cli  to start vllm-serve via trl

* call the python module directly

* update to use vllm with 2.6.0 too now and call trl vllm serve from module

* vllm 0.8.1

* use python3

* use sys.executable

* remove context and wait for start

* fixes to make it actually work

* fixes so the grpo tests pass with new vllm paradigm

* explicit host/port and check in start vllm

* make sure that vllm doesn't hang by setting quiet so outouts go to dev null

* also bump bnb to latest release

* add option for wait from cli and nccl debugging for ci

* grpo + vllm test on separate devices for now

* make sure grpo + vllm tests runs single worker since pynccl comms would conflict

* fix cli

* remove wait and add caching for argilla dataset

* refactoring configs

* chore: lint

* add vllm config

* fixup vllm grpo args

* fix one more incorrect schema/config path

* fix another vlllm reference and increase timeout

* make the tests run a bit faster

* change mbsz back so it is correct for grpo

* another change mbsz back so it is correct for grpo

* fixing cli args

* nits

* adding docs

* docs

* include tensor parallel size for vllm in pydantic schema

* moving start_vllm, more docs

* limit output len for grpo vllm

* vllm enable_prefix_caching isn't a bool cli arg

* fix env ordering in tests and also use pid check when looking for vllm

---------

Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>
2025-03-31 15:47:11 -04:00
2024-11-18 14:58:03 -05:00
2023-05-31 02:53:53 +09:00
2023-04-14 00:20:05 -04:00
2023-05-31 02:53:22 +09:00
2024-08-23 12:21:51 -04:00
2025-01-07 13:42:01 +00:00
2023-06-10 23:36:14 -07:00
2023-07-21 09:49:29 -04:00

Axolotl

GitHub License tests Releases
contributors GitHub Repo stars
discord twitter
tests-nightly multigpu-semi-weekly tests

Axolotl is a tool designed to streamline post-training for various AI models. Post-training refers to any modifications or additional training performed on pre-trained models - including full model fine-tuning, parameter-efficient tuning (like LoRA and QLoRA), supervised fine-tuning (SFT), instruction tuning, and alignment techniques. With support for multiple model architectures and training configurations, Axolotl makes it easy to get started with these techniques.

Axolotl is designed to work with YAML config files that contain everything you need to preprocess a dataset, train or fine-tune a model, run model inference or evaluation, and much more.

Features:

  • Train various Huggingface models such as llama, pythia, falcon, mpt
  • Supports fullfinetune, lora, qlora, relora, and gptq
  • Customize configurations using a simple yaml file or CLI overwrite
  • Load different dataset formats, use custom formats, or bring your own tokenized datasets
  • Integrated with xformers, flash attention, liger kernel, rope scaling, and multipacking
  • Works with single GPU or multiple GPUs via FSDP or Deepspeed
  • Easily run with Docker locally or on the cloud
  • Log results and optionally checkpoints to wandb, mlflow or Comet
  • And more!

🚀 Quick Start

Requirements:

  • NVIDIA GPU (Ampere or newer for bf16 and Flash Attention) or AMD GPU
  • Python 3.11
  • PyTorch ≥2.4.1

Installation

pip3 install -U packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]

# Download example axolotl configs, deepspeed configs
axolotl fetch examples
axolotl fetch deepspeed_configs  # OPTIONAL

Other installation approaches are described here.

Your First Fine-tune

# Fetch axolotl examples
axolotl fetch examples

# Or, specify a custom path
axolotl fetch examples --dest path/to/folder

# Train a model using LoRA
axolotl train examples/llama-3/lora-1b.yml

That's it! Check out our Getting Started Guide for a more detailed walkthrough.

Key Features

  • Multiple Model Support: Train various models like LLaMA, Mistral, Mixtral, Pythia, and more
  • Training Methods: Full fine-tuning, LoRA, QLoRA, and more
  • Easy Configuration: Simple YAML files to control your training setup
  • Performance Optimizations: Flash Attention, xformers, multi-GPU training
  • Flexible Dataset Handling: Use various formats and custom datasets
  • Cloud Ready: Run on cloud platforms or local hardware

📚 Documentation

🤝 Getting Help

🌟 Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Supported Models

fp16/fp32 lora qlora gptq gptq w/flash attn flash attn xformers attn
llama
Mistral
Mixtral-MoE
Mixtral8X22
Pythia
cerebras
btlm
mpt
falcon
gpt-j
XGen
phi
RWKV
Qwen
Gemma
Jamba

: supported : not supported : untested

❤️ Sponsors

Thank you to our sponsors who help make Axolotl possible:

  • Modal - Modal lets you run jobs in the cloud, by just writing a few lines of Python. Customers use Modal to deploy Gen AI models at large scale, fine-tune large language models, run protein folding simulations, and much more.

Interested in sponsoring? Contact us at wing@axolotl.ai

📜 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Description
Fork of axolotl-ai-cloud/axolotl @ v0.16.1 � activeblue patches for RTX 5080 / CUDA 12.8
Readme Apache-2.0 48 MiB
Languages
Python 97.4%
Jinja 2.3%
Shell 0.2%