Clarify pre-tokenize before multigpu (#359)

2023-08-11 11:27:42 +09:00
parent 11ddccb80f
commit 94d03c8402
1 changed files with 8 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -524,7 +524,14 @@ Run
 accelerate launch scripts/finetune.py configs/your_config.yml
 ```

-#### Multi-GPU Config
+#### Multi-GPU
+
+It is recommended to pre-tokenize dataset with the following before finetuning:
+```bash
+CUDA_VISIBLE_DEVICES="" accelerate ... --prepare_ds_only
+```
+
+##### Config

 - llama FSDP
 ```yaml