refactor README; hardcode links to quarto docs; add additional quarto doc pages (#2295)
* refactor README; hardcode links to quarto docs; add additional quarto doc pages * updates * review comments * update --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>
This commit is contained in:
@@ -8,14 +8,12 @@ order: 3
|
||||
|
||||
IMPORTANT: ShareGPT is deprecated!. Please see `chat_template` section below.
|
||||
|
||||
|
||||
## pygmalion
|
||||
|
||||
```{.json filename="data.jsonl"}
|
||||
{"conversations": [{"role": "...", "value": "..."}]}
|
||||
```
|
||||
|
||||
|
||||
## chat_template
|
||||
|
||||
Chat Template strategy uses a jinja2 template that converts a list of messages into a prompt. Support using tokenizer's template, a supported template, or custom jinja2.
|
||||
|
||||
@@ -6,8 +6,15 @@ order: 3
|
||||
|
||||
## Stepwise Supervised
|
||||
|
||||
The stepwise supervised format is designed for chain-of-thought (COT) reasoning datasets where each example contains multiple completion steps and a preference label for each step.
|
||||
### ExampleHere's a simple example of a stepwise supervised dataset entry:```json
|
||||
The stepwise supervised format is designed for chain-of-thought (COT) reasoning
|
||||
datasets where each example contains multiple completion steps and a preference label
|
||||
for each step.
|
||||
|
||||
### Example
|
||||
|
||||
Here's a simple example of a stepwise supervised dataset entry:
|
||||
|
||||
```json
|
||||
{
|
||||
"prompt": "Which number is larger, 9.8 or 9.11?",
|
||||
"completions": [
|
||||
@@ -16,3 +23,4 @@ The stepwise supervised format is designed for chain-of-thought (COT) reasoning
|
||||
],
|
||||
"labels": [true, false]
|
||||
}
|
||||
```
|
||||
155
docs/getting-started.qmd
Normal file
155
docs/getting-started.qmd
Normal file
@@ -0,0 +1,155 @@
|
||||
---
|
||||
title: "Getting Started with Axolotl"
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
toc-depth: 3
|
||||
number-sections: true
|
||||
execute:
|
||||
enabled: false
|
||||
---
|
||||
|
||||
This guide will walk you through your first model fine-tuning project with Axolotl.
|
||||
|
||||
## Quick Example {#sec-quick-example}
|
||||
|
||||
Let's start by fine-tuning a small language model using LoRA. This example uses a 1B parameter model to ensure it runs on most GPUs.
|
||||
Assuming `axolotl` is installed (if not, see our [Installation Guide](installation.qmd))
|
||||
|
||||
1. Download example configs:
|
||||
```shell
|
||||
axolotl fetch examples
|
||||
```
|
||||
|
||||
2. Run the training:
|
||||
```shell
|
||||
axolotl train examples/llama-3/lora-1b.yml
|
||||
```
|
||||
|
||||
That's it! Let's understand what just happened.
|
||||
|
||||
## Understanding the Process {#sec-understanding}
|
||||
|
||||
### The Configuration File {#sec-config}
|
||||
|
||||
The YAML configuration file controls everything about your training. Here's what (part of) our example config looks like:
|
||||
|
||||
```yaml
|
||||
base_model: NousResearch/Llama-3.2-1B
|
||||
# hub_model_id: username/custom_model_name
|
||||
|
||||
datasets:
|
||||
- path: teknium/GPT4-LLM-Cleaned
|
||||
type: alpaca
|
||||
dataset_prepared_path: last_run_prepared
|
||||
val_set_size: 0.1
|
||||
output_dir: ./outputs/lora-out
|
||||
|
||||
adapter: lora
|
||||
lora_model_dir:
|
||||
```
|
||||
|
||||
See our [Config options](config.qmd) for more details.
|
||||
|
||||
### Training {#sec-training}
|
||||
|
||||
When you run `axolotl train`, Axolotl:
|
||||
|
||||
1. Downloads the base model
|
||||
2. (If specified) applies LoRA adapter layers
|
||||
3. Loads and processes the dataset
|
||||
4. Runs the training loop
|
||||
5. Saves the trained model and / or LoRA weights
|
||||
|
||||
## Your First Custom Training {#sec-custom}
|
||||
|
||||
Let's modify the example for your own data:
|
||||
|
||||
1. Create a new config file `my_training.yml`:
|
||||
|
||||
```yaml
|
||||
base_model: NousResearch/Nous-Hermes-llama-1b-v1
|
||||
adapter: lora
|
||||
|
||||
# Training settings
|
||||
micro_batch_size: 2
|
||||
num_epochs: 3
|
||||
learning_rate: 0.0003
|
||||
|
||||
# Your dataset
|
||||
datasets:
|
||||
- path: my_data.jsonl # Your local data file
|
||||
type: alpaca # Or other format
|
||||
```
|
||||
|
||||
This specific config is for LoRA fine-tuning a model with instruction tuning data using
|
||||
the `alpaca` dataset format, which has the following format:
|
||||
|
||||
```json
|
||||
{
|
||||
"instruction": "Write a description of alpacas.",
|
||||
"input": "",
|
||||
"output": "Alpacas are domesticated South American camelids..."
|
||||
}
|
||||
```
|
||||
|
||||
Please see our [Dataset Formats](dataset-formats) for more dataset formats and how to
|
||||
format them.
|
||||
|
||||
2. Prepare your JSONL data in the specified format (in this case, the expected `alpaca
|
||||
format):
|
||||
|
||||
```json
|
||||
{"instruction": "Classify this text", "input": "I love this!", "output": "positive"}
|
||||
{"instruction": "Classify this text", "input": "Not good at all", "output": "negative"}
|
||||
```
|
||||
|
||||
Please consult the supported [Dataset Formats](dataset-formats/) for more details.
|
||||
|
||||
3. Run the training:
|
||||
|
||||
```shell
|
||||
axolotl train my_training.yml
|
||||
```
|
||||
|
||||
## Common Tasks {#sec-common-tasks}
|
||||
|
||||
### Testing Your Model {#sec-testing}
|
||||
|
||||
After training, test your model:
|
||||
|
||||
```shell
|
||||
axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out"
|
||||
```
|
||||
|
||||
### Preprocessing Data {#sec-preprocessing}
|
||||
|
||||
For large datasets, preprocess first:
|
||||
|
||||
```shell
|
||||
axolotl preprocess my_training.yml
|
||||
```
|
||||
|
||||
### Using a UI {#sec-ui}
|
||||
|
||||
Launch a Gradio interface:
|
||||
|
||||
```shell
|
||||
axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out" --gradio
|
||||
```
|
||||
|
||||
## Next Steps {#sec-next-steps}
|
||||
|
||||
Now that you have the basics, you might want to:
|
||||
|
||||
- Try different model architectures
|
||||
- Experiment with hyperparameters
|
||||
- Use more advanced training methods
|
||||
- Scale up to larger models
|
||||
|
||||
Check our other guides for details on these topics:
|
||||
|
||||
- [Configuration Guide](config.qmd) - Full configuration options
|
||||
- [Dataset Formats](dataset-formats) - Working with different data formats
|
||||
- [Multi-GPU Training](multi-gpu.qmd)
|
||||
- [Multi-Node Training](multi-node.qmd)
|
||||
148
docs/inference.qmd
Normal file
148
docs/inference.qmd
Normal file
@@ -0,0 +1,148 @@
|
||||
---
|
||||
title: "Inference Guide"
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
toc-depth: 3
|
||||
number-sections: true
|
||||
code-tools: true
|
||||
execute:
|
||||
enabled: false
|
||||
---
|
||||
|
||||
This guide covers how to use your trained models for inference, including model loading, interactive testing, and common troubleshooting steps.
|
||||
|
||||
## Quick Start {#sec-quickstart}
|
||||
|
||||
### Basic Inference {#sec-basic}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## LoRA Models
|
||||
|
||||
```{.bash}
|
||||
axolotl inference your_config.yml --lora-model-dir="./lora-output-dir"
|
||||
```
|
||||
|
||||
## Full Fine-tuned Models
|
||||
|
||||
```{.bash}
|
||||
axolotl inference your_config.yml --base-model="./completed-model"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Advanced Usage {#sec-advanced}
|
||||
|
||||
### Gradio Interface {#sec-gradio}
|
||||
|
||||
Launch an interactive web interface:
|
||||
|
||||
```{.bash}
|
||||
axolotl inference your_config.yml --gradio
|
||||
```
|
||||
|
||||
### File-based Prompts {#sec-file-prompts}
|
||||
|
||||
Process prompts from a text file:
|
||||
|
||||
```{.bash}
|
||||
cat /tmp/prompt.txt | axolotl inference your_config.yml \
|
||||
--base-model="./completed-model" --prompter=None
|
||||
```
|
||||
|
||||
### Memory Optimization {#sec-memory}
|
||||
|
||||
For large models or limited memory:
|
||||
|
||||
```{.bash}
|
||||
axolotl inference your_config.yml --load-in-8bit=True
|
||||
```
|
||||
|
||||
## Merging LoRA Weights {#sec-merging}
|
||||
|
||||
Merge LoRA adapters with the base model:
|
||||
|
||||
```{.bash}
|
||||
axolotl merge-lora your_config.yml --lora-model-dir="./completed-model"
|
||||
```
|
||||
|
||||
### Memory Management for Merging {#sec-memory-management}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Configuration Options
|
||||
|
||||
```{.yaml}
|
||||
gpu_memory_limit: 20GiB # Adjust based on your GPU
|
||||
lora_on_cpu: true # Process on CPU if needed
|
||||
```
|
||||
|
||||
## Force CPU Merging
|
||||
|
||||
```{.bash}
|
||||
CUDA_VISIBLE_DEVICES="" axolotl merge-lora ...
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Tokenization {#sec-tokenization}
|
||||
|
||||
### Common Issues {#sec-tokenization-issues}
|
||||
|
||||
::: {.callout-warning}
|
||||
Tokenization mismatches between training and inference are a common source of problems.
|
||||
:::
|
||||
|
||||
To debug:
|
||||
|
||||
1. Check training tokenization:
|
||||
```{.bash}
|
||||
axolotl preprocess your_config.yml --debug
|
||||
```
|
||||
|
||||
2. Verify inference tokenization by decoding tokens before model input
|
||||
|
||||
3. Compare token IDs between training and inference
|
||||
|
||||
### Special Tokens {#sec-special-tokens}
|
||||
|
||||
Configure special tokens in your YAML:
|
||||
|
||||
```{.yaml}
|
||||
special_tokens:
|
||||
bos_token: "<s>"
|
||||
eos_token: "</s>"
|
||||
unk_token: "<unk>"
|
||||
tokens:
|
||||
- "<|im_start|>"
|
||||
- "<|im_end|>"
|
||||
```
|
||||
|
||||
## Troubleshooting {#sec-troubleshooting}
|
||||
|
||||
### Common Problems {#sec-common-problems}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Memory Issues
|
||||
|
||||
- Use 8-bit loading
|
||||
- Reduce batch sizes
|
||||
- Try CPU offloading
|
||||
|
||||
## Token Issues
|
||||
|
||||
- Verify special tokens
|
||||
- Check tokenizer settings
|
||||
- Compare training and inference preprocessing
|
||||
|
||||
## Performance Issues
|
||||
|
||||
- Verify model loading
|
||||
- Check prompt formatting
|
||||
- Ensure temperature/sampling settings
|
||||
|
||||
:::
|
||||
|
||||
For more details, see our [debugging guide](debugging.qmd).
|
||||
119
docs/installation.qmd
Normal file
119
docs/installation.qmd
Normal file
@@ -0,0 +1,119 @@
|
||||
---
|
||||
title: "Installation Guide"
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
toc-depth: 3
|
||||
number-sections: true
|
||||
code-tools: true
|
||||
execute:
|
||||
enabled: false
|
||||
---
|
||||
|
||||
This guide covers all the ways you can install and set up Axolotl for your environment.
|
||||
|
||||
## Requirements {#sec-requirements}
|
||||
|
||||
- NVIDIA GPU (Ampere architecture or newer for `bf16` and Flash Attention) or AMD GPU
|
||||
- Python ≥3.10
|
||||
- PyTorch ≥2.4.1
|
||||
|
||||
## Installation Methods {#sec-installation-methods}
|
||||
|
||||
### PyPI Installation (Recommended) {#sec-pypi}
|
||||
|
||||
```{.bash}
|
||||
pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]
|
||||
```
|
||||
|
||||
We use `--no-build-isolation` in order to detect the installed PyTorch version (if
|
||||
installed) in order not to clobber it, and so that we set the correct version of
|
||||
dependencies that are specific to the PyTorch version or other installed
|
||||
co-dependencies.
|
||||
|
||||
### Edge/Development Build {#sec-edge-build}
|
||||
|
||||
For the latest features between releases:
|
||||
|
||||
```{.bash}
|
||||
git clone https://github.com/axolotl-ai-cloud/axolotl.git
|
||||
cd axolotl
|
||||
pip3 install packaging ninja
|
||||
pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'
|
||||
```
|
||||
|
||||
### Docker {#sec-docker}
|
||||
|
||||
```{.bash}
|
||||
docker run --gpus '"all"' --rm -it axolotlai/axolotl:main-latest
|
||||
```
|
||||
|
||||
For development with Docker:
|
||||
|
||||
```{.bash}
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
::: {.callout-tip}
|
||||
### Advanced Docker Configuration
|
||||
```{.bash}
|
||||
docker run --privileged --gpus '"all"' --shm-size 10g --rm -it \
|
||||
--name axolotl --ipc=host \
|
||||
--ulimit memlock=-1 --ulimit stack=67108864 \
|
||||
--mount type=bind,src="${PWD}",target=/workspace/axolotl \
|
||||
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
|
||||
axolotlai/axolotl:main-latest
|
||||
```
|
||||
:::
|
||||
|
||||
## Cloud Environments {#sec-cloud}
|
||||
|
||||
### Cloud GPU Providers {#sec-cloud-gpu}
|
||||
|
||||
For providers supporting Docker:
|
||||
|
||||
- Use `axolotlai/axolotl-cloud:main-latest`
|
||||
- Available on:
|
||||
- [Latitude.sh](https://latitude.sh/blueprint/989e0e79-3bf6-41ea-a46b-1f246e309d5c)
|
||||
- [JarvisLabs.ai](https://jarvislabs.ai/templates/axolotl)
|
||||
- [RunPod](https://runpod.io/gsc?template=v2ickqhz9s&ref=6i7fkpdz)
|
||||
|
||||
### Google Colab {#sec-colab}
|
||||
|
||||
Use our [example notebook](../examples/colab-notebooks/colab-axolotl-example.ipynb).
|
||||
|
||||
## Platform-Specific Instructions {#sec-platform-specific}
|
||||
|
||||
### macOS {#sec-macos}
|
||||
|
||||
```{.bash}
|
||||
pip3 install --no-build-isolation -e '.'
|
||||
```
|
||||
|
||||
See @sec-troubleshooting for Mac-specific issues.
|
||||
|
||||
### Windows {#sec-windows}
|
||||
|
||||
::: {.callout-important}
|
||||
We recommend using WSL2 (Windows Subsystem for Linux) or Docker.
|
||||
:::
|
||||
|
||||
## Environment Managers {#sec-env-managers}
|
||||
|
||||
### Conda/Pip venv {#sec-conda}
|
||||
|
||||
1. Install Python ≥3.10
|
||||
2. Install PyTorch: https://pytorch.org/get-started/locally/
|
||||
3. Install Axolotl:
|
||||
```{.bash}
|
||||
pip3 install packaging
|
||||
pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'
|
||||
```
|
||||
4. (Optional) Login to Hugging Face:
|
||||
```{.bash}
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
## Troubleshooting {#sec-troubleshooting}
|
||||
|
||||
If you encounter installation issues, see our [FAQ](faq.qmd) and [Debugging Guide](debugging.qmd).
|
||||
118
docs/multi-gpu.qmd
Normal file
118
docs/multi-gpu.qmd
Normal file
@@ -0,0 +1,118 @@
|
||||
---
|
||||
title: "Multi-GPU Training Guide"
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
toc-depth: 3
|
||||
number-sections: true
|
||||
code-tools: true
|
||||
execute:
|
||||
enabled: false
|
||||
---
|
||||
|
||||
This guide covers advanced training configurations for multi-GPU setups using Axolotl.
|
||||
|
||||
## Overview {#sec-overview}
|
||||
|
||||
Axolotl supports several methods for multi-GPU training:
|
||||
|
||||
- DeepSpeed (recommended)
|
||||
- FSDP (Fully Sharded Data Parallel)
|
||||
- FSDP + QLoRA
|
||||
|
||||
## DeepSpeed {#sec-deepspeed}
|
||||
|
||||
DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.
|
||||
|
||||
### Configuration {#sec-deepspeed-config}
|
||||
|
||||
Add to your YAML config:
|
||||
|
||||
```{.yaml}
|
||||
deepspeed: deepspeed_configs/zero1.json
|
||||
```
|
||||
|
||||
### Usage {#sec-deepspeed-usage}
|
||||
|
||||
```{.bash}
|
||||
accelerate launch -m axolotl.cli.train examples/llama-2/config.yml --deepspeed deepspeed_configs/zero1.json
|
||||
```
|
||||
|
||||
### ZeRO Stages {#sec-zero-stages}
|
||||
|
||||
We provide default configurations for:
|
||||
|
||||
- ZeRO Stage 1 (`zero1.json`)
|
||||
- ZeRO Stage 2 (`zero2.json`)
|
||||
- ZeRO Stage 3 (`zero3.json`)
|
||||
|
||||
Choose based on your memory requirements and performance needs.
|
||||
|
||||
## FSDP {#sec-fsdp}
|
||||
|
||||
### Basic FSDP Configuration {#sec-fsdp-config}
|
||||
|
||||
```{.yaml}
|
||||
fsdp:
|
||||
- full_shard
|
||||
- auto_wrap
|
||||
fsdp_config:
|
||||
fsdp_offload_params: true
|
||||
fsdp_state_dict_type: FULL_STATE_DICT
|
||||
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
|
||||
```
|
||||
|
||||
### FSDP + QLoRA {#sec-fsdp-qlora}
|
||||
|
||||
For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
|
||||
|
||||
## Performance Optimization {#sec-performance}
|
||||
|
||||
### Liger Kernel Integration {#sec-liger}
|
||||
|
||||
::: {.callout-note}
|
||||
Liger Kernel provides efficient Triton kernels for LLM training, offering:
|
||||
|
||||
- 20% increase in multi-GPU training throughput
|
||||
- 60% reduction in memory usage
|
||||
- Compatibility with both FSDP and DeepSpeed
|
||||
:::
|
||||
|
||||
Configuration:
|
||||
|
||||
```{.yaml}
|
||||
plugins:
|
||||
- axolotl.integrations.liger.LigerPlugin
|
||||
liger_rope: true
|
||||
liger_rms_norm: true
|
||||
liger_glu_activation: true
|
||||
liger_layer_norm: true
|
||||
liger_fused_linear_cross_entropy: true
|
||||
```
|
||||
|
||||
## Troubleshooting {#sec-troubleshooting}
|
||||
|
||||
### NCCL Issues {#sec-nccl}
|
||||
|
||||
For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd).
|
||||
|
||||
### Common Problems {#sec-common-problems}
|
||||
|
||||
::: {.panel-tabset}
|
||||
|
||||
## Memory Issues
|
||||
|
||||
- Reduce `micro_batch_size`
|
||||
- Reduce `eval_batch_size`
|
||||
- Adjust `gradient_accumulation_steps`
|
||||
- Consider using a higher ZeRO stage
|
||||
|
||||
## Training Instability
|
||||
|
||||
- Start with DeepSpeed ZeRO-2
|
||||
- Monitor loss values
|
||||
- Check learning rates
|
||||
|
||||
:::
|
||||
|
||||
For more detailed troubleshooting, see our [debugging guide](debugging.qmd).
|
||||
Reference in New Issue
Block a user