* Prepare for transformers v5 upgrade * fix hf cli * update for hf hub changes * fix tokenizer apply_chat_template args * remap include_tokens_per_second * fix tps * handle migration for warmup * use latest hf hub * Fix scan -> ls * fix import * fix for renaming of mistral common tokenizer -> backend * update for fixed tokenziation for llama * Skip phi35 tests for now * remove mistral patch fixed upstream in huggingface/transformers#41439 * use namespacing for patch * don't rely on sdist for e2e tests for now * run modal ci without waiting too * Fix dep for ci * fix imports * Fix fp8 check * fsdp2 fixes * fix version handling * update fsdp version tests for new v5 behavior * Fail multigpu tests after 3 failures * skip known v5 broken tests for now and cleanup * bump deps * unmark skipped test * re-enable test_fsdp_qlora_prequant_packed test * increase multigpu ci timeout * skip broken gemma3 test * reduce timout back to original 120min now that the hanging test is skipped * fix for un-necessary collator for pretraining with bsz=1 * fix: safe_serialization deprecated in transformers v5 rc01 (#3318) * torch_dtype deprecated * load model in float32 for consistency with tests * revert some test fixtures back * use hf cache ls instead of scan * don't strip fsdp_version more fdsp_Version fixes for v5 fix version in fsdp_config fix aliasing fix fsdp_version check check fsdp_version is 2 in both places * Transformers v5 rc2 (#3347) * bump dep * use latest fbgemm, grab model config as part of fixture, un-skip test * import AutoConfig * don't need more problematic autoconfig when specifying config.json manually * add fixtures for argilla ultrafeedback datasets * download phi4-reasoning * fix arg * update tests for phi fast tokenizer changes * use explicit model types for gemma3 --------- Co-authored-by: Wing Lian <wing@axolotl.ai> * fix: AutoModelForVision2Seq -> AutoModelForImageTextToText * chore: remove duplicate * fix: attempt fix gemma3 text mode * chore: lint * ga release of v5 * need property setter for name_or_path for mistral tokenizer * vllm not compatible with transformers v5 * setter for chat_template w mistral too --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: salman <salman.mohammadi@outlook.com>
109 lines
3.3 KiB
Plaintext
109 lines
3.3 KiB
Plaintext
---
|
|
title: AMD GPUs on HPC Systems
|
|
description: A comprehensive guide for using Axolotl on distributed systems with AMD GPUs
|
|
---
|
|
|
|
This guide provides step-by-step instructions for installing and configuring Axolotl on a High-Performance Computing (HPC) environment equipped with AMD GPUs.
|
|
|
|
## Setup
|
|
|
|
### 1. Install Python
|
|
|
|
We recommend using Miniforge, a minimal conda-based Python distribution:
|
|
|
|
```bash
|
|
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
|
|
bash Miniforge3-$(uname)-$(uname -m).sh
|
|
```
|
|
|
|
### 2. Configure Python Environment
|
|
Add Python to your PATH and ensure it's available at login:
|
|
|
|
```bash
|
|
echo 'export PATH=~/miniforge3/bin:$PATH' >> ~/.bashrc
|
|
echo 'if [ -f ~/.bashrc ]; then . ~/.bashrc; fi' >> ~/.bash_profile
|
|
```
|
|
|
|
### 3. Load AMD GPU Software
|
|
|
|
Load the ROCm module:
|
|
|
|
```bash
|
|
module load rocm/5.7.1
|
|
```
|
|
|
|
Note: The specific module name and version may vary depending on your HPC system. Consult your system documentation for the correct module name.
|
|
|
|
### 4. Install PyTorch
|
|
|
|
Install PyTorch with ROCm support:
|
|
|
|
```bash
|
|
pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7 --force-reinstall
|
|
```
|
|
|
|
### 5. Install Flash Attention
|
|
|
|
Clone and install the Flash Attention repository:
|
|
|
|
```bash
|
|
git clone --recursive https://github.com/ROCmSoftwarePlatform/flash-attention.git
|
|
export GPU_ARCHS="gfx90a"
|
|
cd flash-attention
|
|
export PYTHON_SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
|
|
patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch
|
|
pip install --no-build-isolation .
|
|
```
|
|
|
|
### 6. Install Axolotl
|
|
|
|
Clone and install Axolotl:
|
|
|
|
```bash
|
|
git clone https://github.com/axolotl-ai-cloud/axolotl
|
|
cd axolotl
|
|
pip install packaging ninja
|
|
pip install --no-build-isolation -e .
|
|
```
|
|
|
|
### 7. Apply xformers Workaround
|
|
|
|
xformers appears to be incompatible with ROCm. Apply the following workarounds:
|
|
- Edit $HOME/packages/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py modifying the code to always return `False` for SwiGLU availability from xformers.
|
|
- Edit $HOME/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py replacing the "SwiGLU" function with a pass statement.
|
|
|
|
### 8. Prepare Job Submission Script
|
|
|
|
Create a script for job submission using your HPC's particular software (e.g. Slurm, PBS). Include necessary environment setup and the command to run Axolotl training. If the compute node(s) do(es) not have internet access, it is recommended to include
|
|
|
|
```bash
|
|
export TRANSFORMERS_OFFLINE=1
|
|
export HF_DATASETS_OFFLINE=1
|
|
```
|
|
|
|
### 9. Download Base Model
|
|
|
|
Download a base model using the Hugging Face CLI:
|
|
|
|
```bash
|
|
hf download meta-llama/Meta-Llama-3.1-8B --local-dir ~/hfdata/llama3.1-8B
|
|
```
|
|
|
|
### 10. Create Axolotl Configuration
|
|
|
|
Create an Axolotl configuration file (YAML format) tailored to your specific training requirements and dataset. Use FSDP for multi-node training.
|
|
|
|
Note: Deepspeed did not work at the time of testing. However, if anyone managed to get it working, please let us know.
|
|
|
|
### 11. Preprocess Data
|
|
|
|
Run preprocessing on the login node:
|
|
|
|
```bash
|
|
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess /path/to/your/config.yaml
|
|
```
|
|
|
|
### 12. Train
|
|
|
|
You are now ready to submit your previously prepared job script. 🚂
|