feat(doc): add optimizations table of content to our improvements (#3175) [skip ci]
* chore: format * feat: add usage for alst * chore: wording * feat: add optimizations doc * Apply suggestion from @SalmanMohammadi Co-authored-by: salman <salman.mohammadi@outlook.com> * Update docs/dataset-formats/index.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> * feat: add alst, act offloading, nd parallelism, use relative links, and fix format * chore: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>
This commit is contained in:
@@ -7,3 +7,24 @@ techniques. It is a combination of:
|
||||
- Activation Offloading: Offload activations to CPU RAM to reduce memory usage
|
||||
|
||||
For more information, you can check out the ALST paper [here](https://www.arxiv.org/abs/2506.13996).
|
||||
|
||||
## Usage
|
||||
|
||||
```yaml
|
||||
tiled_mlp: true
|
||||
|
||||
# See Sequence Parallelism docs
|
||||
# https://docs.axolotl.ai/docs/sequence_parallelism.html
|
||||
context_parallel_size: int
|
||||
|
||||
plugins:
|
||||
# See Cut Cross Entropy docs
|
||||
# https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy
|
||||
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
|
||||
|
||||
# or Liger Kernel docs
|
||||
# https://docs.axolotl.ai/docs/custom_integrations.html#liger-kernels
|
||||
- axolotl.integrations.liger.LigerPlugin
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user