note MAX_JOBS for flash-attn compile speed
This commit is contained in:
@@ -49,7 +49,12 @@ python -c "import torch; print('CUDA:', torch.version.cuda); print('GPU:', torch
|
||||
### 6. Install Axolotl
|
||||
```bash
|
||||
pip install -e "."
|
||||
pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
> **flash-attn compiles CUDA kernels from source — takes 15–25 min on 10 cores of i7-14700K.**
|
||||
> Always set `MAX_JOBS` to the number of available CPU cores to parallelize and speed up compilation:
|
||||
```bash
|
||||
MAX_JOBS=10 pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
## Every Session (after first-time setup)
|
||||
@@ -75,3 +80,4 @@ axolotl train human_chat_qlora.yml
|
||||
| `CUDA version mismatch 13.2 vs 12.8` | Conda nvcc is 13.2, torch was cu128 | Reinstall torch with `--index-url .../cu132` |
|
||||
| `torchaudio` not found for cu132 | No cu132 wheel exists | Skip torchaudio — not needed |
|
||||
| `src refspec main does not match` | Fork default branch is `activeblue/main` | `git push origin activeblue/main` |
|
||||
| flash-attn compile is slow | Single-threaded by default | Set `MAX_JOBS=<cpu_count>` before pip install |
|
||||
|
||||
Reference in New Issue
Block a user