From eadd15c96094eeb7f6a186b931d652e5fbb1181e Mon Sep 17 00:00:00 2001 From: tocmo0nlord Date: Wed, 13 May 2026 04:45:21 +0000 Subject: [PATCH] note MAX_JOBS for flash-attn compile speed --- SETUP_MIAAI.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/SETUP_MIAAI.md b/SETUP_MIAAI.md index 79e6cfd35..36d1a8db9 100644 --- a/SETUP_MIAAI.md +++ b/SETUP_MIAAI.md @@ -49,7 +49,12 @@ python -c "import torch; print('CUDA:', torch.version.cuda); print('GPU:', torch ### 6. Install Axolotl ```bash pip install -e "." -pip install flash-attn --no-build-isolation +``` + +> **flash-attn compiles CUDA kernels from source — takes 15–25 min on 10 cores of i7-14700K.** +> Always set `MAX_JOBS` to the number of available CPU cores to parallelize and speed up compilation: +```bash +MAX_JOBS=10 pip install flash-attn --no-build-isolation ``` ## Every Session (after first-time setup) @@ -75,3 +80,4 @@ axolotl train human_chat_qlora.yml | `CUDA version mismatch 13.2 vs 12.8` | Conda nvcc is 13.2, torch was cu128 | Reinstall torch with `--index-url .../cu132` | | `torchaudio` not found for cu132 | No cu132 wheel exists | Skip torchaudio — not needed | | `src refspec main does not match` | Fork default branch is `activeblue/main` | `git push origin activeblue/main` | +| flash-attn compile is slow | Single-threaded by default | Set `MAX_JOBS=` before pip install |