feat(doc): add optimizations table of content to our improvements (#3175) [skip ci]

* chore: format * feat: add usage for alst * chore: wording * feat: add optimizations doc * Apply suggestion from @SalmanMohammadi Co-authored-by: salman <salman.mohammadi@outlook.com> * Update docs/dataset-formats/index.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> * feat: add alst, act offloading, nd parallelism, use relative links, and fix format * chore: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-09-25 03:13:49 +07:00
parent 6bc959342b
commit 856ff12171
5 changed files with 157 additions and 1 deletions
--- a/docs/dataset-formats/index.qmd
+++ b/docs/dataset-formats/index.qmd
@@ -61,7 +61,7 @@ While we recommend `.jsonl`, you can also use the other formats (`csv`, `parquet

 ### Pre-training without streaming

-On the rare case that the dataset is small and can be loaded entirely into memory, another approach to running pre-training is to use the `completion` format. This would mean that the entire dataset is pre-tokenized instead of on-demand in streaming.
+In the case that the dataset is small and can be loaded entirely into memory, another approach to running pre-training is to use the `completion` format. This would mean that the entire dataset is pre-tokenized instead of on-demand in streaming.

 One benefit of this is that the tokenization can be performed separately on a CPU-only machine, and then transferred to a GPU machine for training to save costs.