From a21935f07af9d825d7730fe944d29cfdef3a5337 Mon Sep 17 00:00:00 2001 From: Wing Lian Date: Thu, 19 Oct 2023 21:32:30 -0400 Subject: [PATCH] add to docs (#703) --- README.md | 2 ++ docs/faq.md | 14 ++++++++++++++ 2 files changed, 16 insertions(+) create mode 100644 docs/faq.md diff --git a/README.md b/README.md index c70abf648..a7650ca5d 100644 --- a/README.md +++ b/README.md @@ -901,6 +901,8 @@ CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ... ## Common Errors 🧰 +See also the [FAQ's](./docs/faq.md). + > If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it: Please reduce any below diff --git a/docs/faq.md b/docs/faq.md new file mode 100644 index 000000000..e5b729e26 --- /dev/null +++ b/docs/faq.md @@ -0,0 +1,14 @@ +# Axolotl FAQ's + + +> The trainer stopped and hasn't progressed in several minutes. + +Usually an issue with the GPU's communicating with each other. See the [NCCL doc](../docs/nccl.md) + +> Exitcode -9 + +This usually happens when you run out of system RAM. + +> Exitcode -7 while using deepspeed + +Try upgrading deepspeed w: `pip install -U deepspeed`