Adding NCCL Timeout Guide (#536)

* fixes NCCL_P2P_LEVEL=NVL #429

* adding more insights into verious values of NCCL_P2P_LEVEL
This commit is contained in:
The Objective Dad
2023-09-08 10:57:47 -05:00
committed by GitHub
parent e30f1e3cf7
commit 5e2d8a42d9
2 changed files with 50 additions and 0 deletions

View File

@@ -752,6 +752,10 @@ Try to turn off xformers.
It's safe to ignore it.
> NCCL Timeouts during training
See the [NCCL](docs/nccl.md) guide.
## Need help? 🙋♂️
Join our [Discord server](https://discord.gg/HhrNrHJPRb) where we can help you