From 740d5a1d31e100833974f8ad2a7891b6c4dc1f9c Mon Sep 17 00:00:00 2001 From: Dan Saunders Date: Fri, 26 Sep 2025 09:55:15 -0400 Subject: [PATCH] doc fix (#3187) --- docs/lora_optims.qmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/lora_optims.qmd b/docs/lora_optims.qmd index 7cdf53975..40893387b 100644 --- a/docs/lora_optims.qmd +++ b/docs/lora_optims.qmd @@ -5,10 +5,11 @@ description: "Custom autograd functions and Triton kernels in Axolotl for optimi Inspired by [Unsloth](https://github.com/unslothai/unsloth), we've implemented two optimizations for LoRA and QLoRA fine-tuning, supporting both single GPU and multi-GPU -(in the DDP and DeepSpeed settings) training. These include (1) SwiGLU and GEGLU activation function -Triton kernels, and (2) LoRA MLP and attention custom autograd functions. Our goal was -to leverage operator fusion and tensor re-use in order to improve speed and reduce -memory usage during the forward and backward passes of these calculations. +(including the DDP, DeepSpeed, and FSDP2 settings) training. These include (1) SwiGLU +and GEGLU activation function Triton kernels, and (2) LoRA MLP and attention custom +autograd functions. Our goal was to leverage operator fusion and tensor re-use in order +to improve speed and reduce memory usage during the forward and backward passes of +these calculations. We currently support several common model architectures, including (but not limited to): @@ -131,6 +132,5 @@ computation path. ## Future Work - Support for additional model architectures -- Support for the FSDP setting - Support for dropout and bias - Additional operator fusions