refactor README; hardcode links to quarto docs; add additional quarto doc pages (#2295)

* refactor README; hardcode links to quarto docs; add additional quarto doc pages * updates * review comments * update --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>
2025-01-30 12:49:21 -05:00
parent 6f713226dd
commit 6f294c3d8d
10 changed files with 625 additions and 715 deletions
--- a/docs/dataset-formats/conversation.qmd
+++ b/docs/dataset-formats/conversation.qmd
@@ -8,14 +8,12 @@ order: 3

 IMPORTANT: ShareGPT is deprecated!. Please see `chat_template` section below.

-
 ## pygmalion

 ```{.json filename="data.jsonl"}
 {"conversations": [{"role": "...", "value": "..."}]}
 ```

-
 ## chat_template

 Chat Template strategy uses a jinja2 template that converts a list of messages into a prompt. Support using tokenizer's template, a supported template, or custom jinja2.
--- a/docs/dataset-formats/stepwise_supervised.qmd
+++ b/docs/dataset-formats/stepwise_supervised.qmd
@@ -6,8 +6,15 @@ order: 3

 ## Stepwise Supervised

-The stepwise supervised format is designed for chain-of-thought (COT) reasoning datasets where each example contains multiple completion steps and a preference label for each step.
-### ExampleHere's a simple example of a stepwise supervised dataset entry:```json
+The stepwise supervised format is designed for chain-of-thought (COT) reasoning
+datasets where each example contains multiple completion steps and a preference label
+for each step.
+
+### Example
+
+Here's a simple example of a stepwise supervised dataset entry:
+
+```json
 {
  "prompt": "Which number is larger, 9.8 or 9.11?",
  "completions": [
@@ -16,3 +23,4 @@ The stepwise supervised format is designed for chain-of-thought (COT) reasoning
  ],
  "labels": [true, false]
 }
+```
--- a/docs/getting-started.qmd
+++ b/docs/getting-started.qmd
@@ -0,0 +1,155 @@
+---
+title: "Getting Started with Axolotl"
+format:
+  html:
+    toc: true
+    toc-depth: 3
+    number-sections: true
+execute:
+  enabled: false
+---
+
+This guide will walk you through your first model fine-tuning project with Axolotl.
+
+## Quick Example {#sec-quick-example}
+
+Let's start by fine-tuning a small language model using LoRA. This example uses a 1B parameter model to ensure it runs on most GPUs.
+Assuming `axolotl` is installed (if not, see our [Installation Guide](installation.qmd))
+
+1. Download example configs:
+```shell
+axolotl fetch examples
+```
+
+2. Run the training:
+```shell
+axolotl train examples/llama-3/lora-1b.yml
+```
+
+That's it! Let's understand what just happened.
+
+## Understanding the Process {#sec-understanding}
+
+### The Configuration File {#sec-config}
+
+The YAML configuration file controls everything about your training. Here's what (part of) our example config looks like:
+
+```yaml
+base_model: NousResearch/Llama-3.2-1B
+# hub_model_id: username/custom_model_name
+
+datasets:
+  - path: teknium/GPT4-LLM-Cleaned
+    type: alpaca
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.1
+output_dir: ./outputs/lora-out
+
+adapter: lora
+lora_model_dir:
+```
+
+See our [Config options](config.qmd) for more details.
+
+### Training {#sec-training}
+
+When you run `axolotl train`, Axolotl:
+
+1. Downloads the base model
+2. (If specified) applies LoRA adapter layers
+3. Loads and processes the dataset
+4. Runs the training loop
+5. Saves the trained model and / or LoRA weights
+
+## Your First Custom Training {#sec-custom}
+
+Let's modify the example for your own data:
+
+1. Create a new config file `my_training.yml`:
+
+```yaml
+base_model: NousResearch/Nous-Hermes-llama-1b-v1
+adapter: lora
+
+# Training settings
+micro_batch_size: 2
+num_epochs: 3
+learning_rate: 0.0003
+
+# Your dataset
+datasets:
+  - path: my_data.jsonl        # Your local data file
+    type: alpaca               # Or other format
+```
+
+This specific config is for LoRA fine-tuning a model with instruction tuning data using
+the `alpaca` dataset format, which has the following format:
+
+```json
+{
+    "instruction": "Write a description of alpacas.",
+    "input": "",
+    "output": "Alpacas are domesticated South American camelids..."
+}
+```
+
+Please see our [Dataset Formats](dataset-formats) for more dataset formats and how to
+format them.
+
+2. Prepare your JSONL data in the specified format (in this case, the expected `alpaca
+format):
+
+```json
+{"instruction": "Classify this text", "input": "I love this!", "output": "positive"}
+{"instruction": "Classify this text", "input": "Not good at all", "output": "negative"}
+```
+
+Please consult the supported [Dataset Formats](dataset-formats/) for more details.
+
+3. Run the training:
+
+```shell
+axolotl train my_training.yml
+```
+
+## Common Tasks {#sec-common-tasks}
+
+### Testing Your Model {#sec-testing}
+
+After training, test your model:
+
+```shell
+axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out"
+```
+
+### Preprocessing Data {#sec-preprocessing}
+
+For large datasets, preprocess first:
+
+```shell
+axolotl preprocess my_training.yml
+```
+
+### Using a UI {#sec-ui}
+
+Launch a Gradio interface:
+
+```shell
+axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out" --gradio
+```
+
+## Next Steps {#sec-next-steps}
+
+Now that you have the basics, you might want to:
+
+- Try different model architectures
+- Experiment with hyperparameters
+- Use more advanced training methods
+- Scale up to larger models
+
+Check our other guides for details on these topics:
+
+- [Configuration Guide](config.qmd) - Full configuration options
+- [Dataset Formats](dataset-formats) - Working with different data formats
+- [Multi-GPU Training](multi-gpu.qmd)
+- [Multi-Node Training](multi-node.qmd)
--- a/docs/inference.qmd
+++ b/docs/inference.qmd
@@ -0,0 +1,148 @@
+---
+title: "Inference Guide"
+format:
+  html:
+    toc: true
+    toc-depth: 3
+    number-sections: true
+    code-tools: true
+execute:
+  enabled: false
+---
+
+This guide covers how to use your trained models for inference, including model loading, interactive testing, and common troubleshooting steps.
+
+## Quick Start {#sec-quickstart}
+
+### Basic Inference {#sec-basic}
+
+::: {.panel-tabset}
+
+## LoRA Models
+
+```{.bash}
+axolotl inference your_config.yml --lora-model-dir="./lora-output-dir"
+```
+
+## Full Fine-tuned Models
+
+```{.bash}
+axolotl inference your_config.yml --base-model="./completed-model"
+```
+
+:::
+
+## Advanced Usage {#sec-advanced}
+
+### Gradio Interface {#sec-gradio}
+
+Launch an interactive web interface:
+
+```{.bash}
+axolotl inference your_config.yml --gradio
+```
+
+### File-based Prompts {#sec-file-prompts}
+
+Process prompts from a text file:
+
+```{.bash}
+cat /tmp/prompt.txt | axolotl inference your_config.yml \
+  --base-model="./completed-model" --prompter=None
+```
+
+### Memory Optimization {#sec-memory}
+
+For large models or limited memory:
+
+```{.bash}
+axolotl inference your_config.yml --load-in-8bit=True
+```
+
+## Merging LoRA Weights {#sec-merging}
+
+Merge LoRA adapters with the base model:
+
+```{.bash}
+axolotl merge-lora your_config.yml --lora-model-dir="./completed-model"
+```
+
+### Memory Management for Merging {#sec-memory-management}
+
+::: {.panel-tabset}
+
+## Configuration Options
+
+```{.yaml}
+gpu_memory_limit: 20GiB  # Adjust based on your GPU
+lora_on_cpu: true        # Process on CPU if needed
+```
+
+## Force CPU Merging
+
+```{.bash}
+CUDA_VISIBLE_DEVICES="" axolotl merge-lora ...
+```
+
+:::
+
+## Tokenization {#sec-tokenization}
+
+### Common Issues {#sec-tokenization-issues}
+
+::: {.callout-warning}
+Tokenization mismatches between training and inference are a common source of problems.
+:::
+
+To debug:
+
+1. Check training tokenization:
+```{.bash}
+axolotl preprocess your_config.yml --debug
+```
+
+2. Verify inference tokenization by decoding tokens before model input
+
+3. Compare token IDs between training and inference
+
+### Special Tokens {#sec-special-tokens}
+
+Configure special tokens in your YAML:
+
+```{.yaml}
+special_tokens:
+  bos_token: "<s>"
+  eos_token: "</s>"
+  unk_token: "<unk>"
+tokens:
+  - "<|im_start|>"
+  - "<|im_end|>"
+```
+
+## Troubleshooting {#sec-troubleshooting}
+
+### Common Problems {#sec-common-problems}
+
+::: {.panel-tabset}
+
+## Memory Issues
+
+- Use 8-bit loading
+- Reduce batch sizes
+- Try CPU offloading
+
+## Token Issues
+
+- Verify special tokens
+- Check tokenizer settings
+- Compare training and inference preprocessing
+
+## Performance Issues
+
+- Verify model loading
+- Check prompt formatting
+- Ensure temperature/sampling settings
+
+:::
+
+For more details, see our [debugging guide](debugging.qmd).
--- a/docs/installation.qmd
+++ b/docs/installation.qmd
@@ -0,0 +1,119 @@
+---
+title: "Installation Guide"
+format:
+  html:
+    toc: true
+    toc-depth: 3
+    number-sections: true
+    code-tools: true
+execute:
+  enabled: false
+---
+
+This guide covers all the ways you can install and set up Axolotl for your environment.
+
+## Requirements {#sec-requirements}
+
+- NVIDIA GPU (Ampere architecture or newer for `bf16` and Flash Attention) or AMD GPU
+- Python ≥3.10
+- PyTorch ≥2.4.1
+
+## Installation Methods {#sec-installation-methods}
+
+### PyPI Installation (Recommended) {#sec-pypi}
+
+```{.bash}
+pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]
+```
+
+We use `--no-build-isolation` in order to detect the installed PyTorch version (if
+installed) in order not to clobber it, and so that we set the correct version of
+dependencies that are specific to the PyTorch version or other installed
+co-dependencies.
+
+### Edge/Development Build {#sec-edge-build}
+
+For the latest features between releases:
+
+```{.bash}
+git clone https://github.com/axolotl-ai-cloud/axolotl.git
+cd axolotl
+pip3 install packaging ninja
+pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'
+```
+
+### Docker {#sec-docker}
+
+```{.bash}
+docker run --gpus '"all"' --rm -it axolotlai/axolotl:main-latest
+```
+
+For development with Docker:
+
+```{.bash}
+docker compose up -d
+```
+
+::: {.callout-tip}
+### Advanced Docker Configuration
+```{.bash}
+docker run --privileged --gpus '"all"' --shm-size 10g --rm -it \
+  --name axolotl --ipc=host \
+  --ulimit memlock=-1 --ulimit stack=67108864 \
+  --mount type=bind,src="${PWD}",target=/workspace/axolotl \
+  -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
+  axolotlai/axolotl:main-latest
+```
+:::
+
+## Cloud Environments {#sec-cloud}
+
+### Cloud GPU Providers {#sec-cloud-gpu}
+
+For providers supporting Docker:
+
+- Use `axolotlai/axolotl-cloud:main-latest`
+- Available on:
+  - [Latitude.sh](https://latitude.sh/blueprint/989e0e79-3bf6-41ea-a46b-1f246e309d5c)
+  - [JarvisLabs.ai](https://jarvislabs.ai/templates/axolotl)
+  - [RunPod](https://runpod.io/gsc?template=v2ickqhz9s&ref=6i7fkpdz)
+
+### Google Colab {#sec-colab}
+
+Use our [example notebook](../examples/colab-notebooks/colab-axolotl-example.ipynb).
+
+## Platform-Specific Instructions {#sec-platform-specific}
+
+### macOS {#sec-macos}
+
+```{.bash}
+pip3 install --no-build-isolation -e '.'
+```
+
+See @sec-troubleshooting for Mac-specific issues.
+
+### Windows {#sec-windows}
+
+::: {.callout-important}
+We recommend using WSL2 (Windows Subsystem for Linux) or Docker.
+:::
+
+## Environment Managers {#sec-env-managers}
+
+### Conda/Pip venv {#sec-conda}
+
+1. Install Python ≥3.10
+2. Install PyTorch: https://pytorch.org/get-started/locally/
+3. Install Axolotl:
+   ```{.bash}
+   pip3 install packaging
+   pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'
+   ```
+4. (Optional) Login to Hugging Face:
+   ```{.bash}
+   huggingface-cli login
+   ```
+
+## Troubleshooting {#sec-troubleshooting}
+
+If you encounter installation issues, see our [FAQ](faq.qmd) and [Debugging Guide](debugging.qmd).
--- a/docs/multi-gpu.qmd
+++ b/docs/multi-gpu.qmd
@@ -0,0 +1,118 @@
+---
+title: "Multi-GPU Training Guide"
+format:
+  html:
+    toc: true
+    toc-depth: 3
+    number-sections: true
+    code-tools: true
+execute:
+  enabled: false
+---
+
+This guide covers advanced training configurations for multi-GPU setups using Axolotl.
+
+## Overview {#sec-overview}
+
+Axolotl supports several methods for multi-GPU training:
+
+- DeepSpeed (recommended)
+- FSDP (Fully Sharded Data Parallel)
+- FSDP + QLoRA
+
+## DeepSpeed {#sec-deepspeed}
+
+DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.
+
+### Configuration {#sec-deepspeed-config}
+
+Add to your YAML config:
+
+```{.yaml}
+deepspeed: deepspeed_configs/zero1.json
+```
+
+### Usage {#sec-deepspeed-usage}
+
+```{.bash}
+accelerate launch -m axolotl.cli.train examples/llama-2/config.yml --deepspeed deepspeed_configs/zero1.json
+```
+
+### ZeRO Stages {#sec-zero-stages}
+
+We provide default configurations for:
+
+- ZeRO Stage 1 (`zero1.json`)
+- ZeRO Stage 2 (`zero2.json`)
+- ZeRO Stage 3 (`zero3.json`)
+
+Choose based on your memory requirements and performance needs.
+
+## FSDP {#sec-fsdp}
+
+### Basic FSDP Configuration {#sec-fsdp-config}
+
+```{.yaml}
+fsdp:
+  - full_shard
+  - auto_wrap
+fsdp_config:
+  fsdp_offload_params: true
+  fsdp_state_dict_type: FULL_STATE_DICT
+  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
+```
+
+### FSDP + QLoRA {#sec-fsdp-qlora}
+
+For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
+
+## Performance Optimization {#sec-performance}
+
+### Liger Kernel Integration {#sec-liger}
+
+::: {.callout-note}
+Liger Kernel provides efficient Triton kernels for LLM training, offering:
+
+- 20% increase in multi-GPU training throughput
+- 60% reduction in memory usage
+- Compatibility with both FSDP and DeepSpeed
+:::
+
+Configuration:
+
+```{.yaml}
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_layer_norm: true
+liger_fused_linear_cross_entropy: true
+```
+
+## Troubleshooting {#sec-troubleshooting}
+
+### NCCL Issues {#sec-nccl}
+
+For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd).
+
+### Common Problems {#sec-common-problems}
+
+::: {.panel-tabset}
+
+## Memory Issues
+
+- Reduce `micro_batch_size`
+- Reduce `eval_batch_size`
+- Adjust `gradient_accumulation_steps`
+- Consider using a higher ZeRO stage
+
+## Training Instability
+
+- Start with DeepSpeed ZeRO-2
+- Monitor loss values
+- Check learning rates
+
+:::
+
+For more detailed troubleshooting, see our [debugging guide](debugging.qmd).