add miaai environment setup guide

2026-05-13 04:16:03 +00:00
parent b7ec06b8a1
commit 396ce4a9dd
1 changed files with 77 additions and 0 deletions
--- a/SETUP_MIAAI.md
+++ b/SETUP_MIAAI.md
@@ -0,0 +1,77 @@
+# Axolotl Setup — miaai (RTX 5080, CUDA 13.2)
+
+## System Info
+- GPU: NVIDIA RTX 5080 (16GB VRAM)
+- Driver: 580.126.09 — max CUDA 13.0 (nvcc from conda resolves to 13.2)
+- OS: Ubuntu (Python 3.13 system — do NOT use system Python for ML)
+- Axolotl branch: `activeblue/main`
+
+## One-time Setup
+
+### 1. Install Miniconda
+```bash
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+bash miniconda.sh -b -p /opt/miniconda3
+/opt/miniconda3/bin/conda init bash
+source ~/.bashrc
+```
+
+### 2. Create Python 3.11 environment
+```bash
+conda create -n axolotl python=3.11 -y
+conda activate axolotl
+```
+
+### 3. Clone and sync repo with upstream
+```bash
+git clone https://git.activeblue.net/tocmo0nlord/axolotl.git
+cd axolotl
+git remote add upstream https://github.com/axolotl-ai-cloud/axolotl.git
+git fetch upstream
+git rebase upstream/main        # keeps activeblue patches on top
+git push origin activeblue/main --force-with-lease
+```
+
+### 4. Install CUDA toolkit (needed to compile flash-attn)
+```bash
+conda install -y -c "nvidia/label/cuda-12.8.0" cuda-toolkit
+export CUDA_HOME=$CONDA_PREFIX
+export PATH=$CUDA_HOME/bin:$PATH
+```
+
+### 5. Install PyTorch — use cu132 (matches nvcc from conda)
+> NOTE: torchaudio has no cu132 wheel — skip it, not needed for LLM training
+```bash
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu132
+python -c "import torch; print('CUDA:', torch.version.cuda); print('GPU:', torch.cuda.get_device_name(0))"
+```
+
+### 6. Install Axolotl
+```bash
+pip install -e "."
+pip install flash-attn --no-build-isolation
+```
+
+## Every Session (after first-time setup)
+```bash
+export PATH="/opt/miniconda3/bin:$PATH"
+conda activate axolotl
+export CUDA_HOME=$CONDA_PREFIX
+export PATH=$CUDA_HOME/bin:$PATH
+cd /home/tocmo0nlord/axolotl
+```
+
+## Run Training
+```bash
+axolotl train human_chat_qlora.yml
+```
+
+## Common Pitfalls Encountered
+| Problem | Cause | Fix |
+|---|---|---|
+| `externally-managed-environment` | System Python 3.13 blocks pip | Use conda env, never system pip |
+| `No module named torch` (flash-attn) | pip builds in isolated env | Use `--no-build-isolation` |
+| `CUDA_HOME not set` | CUDA toolkit not installed | `conda install cuda-toolkit` from nvidia channel |
+| `CUDA version mismatch 13.2 vs 12.8` | Conda nvcc is 13.2, torch was cu128 | Reinstall torch with `--index-url .../cu132` |
+| `torchaudio` not found for cu132 | No cu132 wheel exists | Skip torchaudio — not needed |
+| `src refspec main does not match` | Fork default branch is `activeblue/main` | `git push origin activeblue/main` |