From cc665dadf97ebe6fb05494223dbc798a6a88793c Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Fri, 26 Sep 2025 14:01:04 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- docs/lora_optims.html | 10 +- search.json | 4 +- sitemap.xml | 398 +++++++++++++++++++++--------------------- 4 files changed, 207 insertions(+), 207 deletions(-) diff --git a/.nojekyll b/.nojekyll index 2b824870b..98d17ca0a 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -6a3f1bdd \ No newline at end of file +818a088b \ No newline at end of file diff --git a/docs/lora_optims.html b/docs/lora_optims.html index 8dda3a32a..7d07d16fc 100644 --- a/docs/lora_optims.html +++ b/docs/lora_optims.html @@ -551,10 +551,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

Inspired by Unsloth, we’ve implemented two optimizations for LoRA and QLoRA fine-tuning, supporting both single GPU and multi-GPU -(in the DDP and DeepSpeed settings) training. These include (1) SwiGLU and GEGLU activation function -Triton kernels, and (2) LoRA MLP and attention custom autograd functions. Our goal was -to leverage operator fusion and tensor re-use in order to improve speed and reduce -memory usage during the forward and backward passes of these calculations.

+(including the DDP, DeepSpeed, and FSDP2 settings) training. These include (1) SwiGLU +and GEGLU activation function Triton kernels, and (2) LoRA MLP and attention custom +autograd functions. Our goal was to leverage operator fusion and tensor re-use in order +to improve speed and reduce memory usage during the forward and backward passes of +these calculations.

We currently support several common model architectures, including (but not limited to):