diff --git a/.nojekyll b/.nojekyll index 1418c6c6c..c75f5ffe3 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -899be981 \ No newline at end of file +e37ff14d \ No newline at end of file diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html index b2583bd48..3ee0da140 100644 --- a/docs/custom_integrations.html +++ b/docs/custom_integrations.html @@ -619,7 +619,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"This guide covers advanced training configurations for multi-GPU setups using Axolotl.
-Axolotl supports several methods for multi-GPU training:
+When training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy.
+You generally cannot combine these strategies; they are mutually exclusive.
+These features can often be combined with the strategies above:
Add to your YAML config:
deepspeed: deepspeed_configs/zero1.json# Fetch deepspeed configs (if not already present)
axolotl fetch deepspeed_configs
@@ -590,8 +593,8 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
# Passing arg via cli
axolotl train config.yml --deepspeed deepspeed_configs/zero1.jsonWe provide default configurations for:
zero1.json)FSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers.
For combining FSDP with QLoRA, see our dedicated guide.
+To migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and
also follow the config field mapping below to update field names.