diff --git a/docs/dataset-formats/index.qmd b/docs/dataset-formats/index.qmd
index 4275858f6..121341e55 100644
--- a/docs/dataset-formats/index.qmd
+++ b/docs/dataset-formats/index.qmd
@@ -129,6 +129,7 @@ You can mix and match within each approach or across approaches to train a model
 We suggest this approach when you want to bring your own tokenized dataset.
 
 Axolotl expects the dataset to have three keys:
+
 - `input_ids`: from tokenizing formatted prompt
 - `attention_mask`: for masking padding. If you don't add padding, it would be equal to `len(input_ids) * [1]`
 - `labels`: this is the same as `input_ids`, however, if you want to mask certain tokens, you would set those indices to `-100`.
diff --git a/docs/faq.qmd b/docs/faq.qmd
index 1b5037db9..7e5dd3d74 100644
--- a/docs/faq.qmd
+++ b/docs/faq.qmd
@@ -19,7 +19,9 @@ description: Frequently asked questions
 
 **Q: AttributeError: 'DummyOptim' object has no attribute 'step'**
 
-> A: You may be using deepspeed with single gpu. Please don't set `deepspeed:` in yaml or cli.
+**Q: ModuleNotFoundError: No module named 'mpi4py' using single GPU with deepspeed**
+
+> A: You may be using deepspeed with single gpu. Please remove the `deepspeed:` section in the yaml file or `--deepspeed` CLI flag.
 
 **Q: The codes is stuck on saving preprocessed datasets.**