diff --git a/.nojekyll b/.nojekyll index feab296d3..cf71fdb00 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -5526dc52 \ No newline at end of file +bff1c1d8 \ No newline at end of file diff --git a/docs/api/kernels.lora.html b/docs/api/kernels.lora.html index 43d8d682d..514e680a1 100644 --- a/docs/api/kernels.lora.html +++ b/docs/api/kernels.lora.html @@ -760,16 +760,19 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
Module for definition of Low-Rank Adaptation (LoRA) Triton kernels.
See “LoRA: Low-Rank Adaptation of Large Language Models” (https://arxiv.org/abs/2106.09685).
+Also supports DoRA (Weight-Decomposed Low-Rank Adaptation): +See “DoRA: Weight-Decomposed Low-Rank Adaptation” (https://arxiv.org/abs/2402.09353).
Credit to unsloth (https://unsloth.ai/) for inspiration for this implementation.
kernels.lora.LoRA_Embedding()Fused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.
+Supports dropout and DoRA.
+kernels.lora.LoRA_MLP()kernels.lora.LoRA_MLP()Optimized LoRA MLP implementation.
-| Name | -Description | -
|---|---|
| backward | -Performs backward pass computation for LoRA MLP. | -
| forward | -Forward pass for LoRA MLP. | -
kernels.lora.LoRA_MLP.backward(ctx, grad_output)Performs backward pass computation for LoRA MLP.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -torch.autograd.function.FunctionCtx | -Context object storing tensors saved during forward pass | -required | -
| grad_output | -torch.Tensor | -Gradient of loss with respect to layer output | -required | -
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | None | -Tuple containing gradients for all inputs from forward pass: | -
| - | None | -- Input gradient tensor (or None) |
-
| - | None | -- None for weights/biases/quantization states |
-
| - | None | -- LoRA A/B matrix gradients (or None) |
-
| - | torch.Tensor | None | -- None for scaling factors |
-
| - | torch.Tensor | None | -- None for activation functions and flags |
-
kernels.lora.LoRA_MLP.forward(
- ctx,
- X,
- gate_weight,
- gate_bias,
- gate_quant,
- gate_A,
- gate_B,
- gate_scale,
- up_weight,
- up_bias,
- up_quant,
- up_A,
- up_B,
- up_scale,
- down_weight,
- down_bias,
- down_quant,
- down_A,
- down_B,
- down_scale,
- activation_fn,
- activation_fn_backward,
- inplace=True,
-)Forward pass for LoRA MLP.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -- | Autograd context | -required | -
| X | -torch.Tensor | -Input features | -required | -
| gate_weight | -torch.Tensor | -Gate projection weight | -required | -
| gate_bias | -torch.Tensor | None | -Gate projection bias | -required | -
| gate_quant | -QuantState | None | -Gate quantization state | -required | -
| gate_A | -torch.Tensor | None | -Gate LoRA A matrix | -required | -
| gate_B | -torch.Tensor | None | -Gate LoRA B matrix | -required | -
| gate_scale | -float | -Gate LoRA scale | -required | -
| up_weight | -torch.Tensor | -Up projection weight | -required | -
| up_quant | -QuantState | None | -Up projection quantization state | -required | -
| up_A | -torch.Tensor | None | -Up projection LoRA A matrix | -required | -
| up_B | -torch.Tensor | None | -Up projection LoRA B matrix | -required | -
| up_scale | -float | -Up projection LoRA scale | -required | -
| down_weight | -torch.Tensor | -Down projection weight | -required | -
| down_bias | -torch.Tensor | None | -Down projection bias | -required | -
| down_quant | -QuantState | None | -Down projection quantization state | -required | -
| down_A | -torch.Tensor | None | -Down projection LoRA A matrix | -required | -
| down_B | -torch.Tensor | None | -Down projection LoRA B matrix | -required | -
| down_scale | -float | -Down projection LoRA scale | -required | -
| activation_fn | -Callable | -Forward activation function | -required | -
| activation_fn_backward | -Callable | -Backward activation function | -required | -
| inplace | -bool | None | -Whether to perform operations in-place | -True |
-
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | -Output transformed by multi-layer perceptron and activation function | -
Supports bias, dropout, and DoRA. Dropout is applied to the input for +gate/up projections. The down projection uses hidden states (post-activation) +as input, so dropout is not applied there.
kernels.lora.LoRA_O()kernels.lora.LoRA_O()Optimized LoRA implementation for output projection.
-| Name | -Description | -
|---|---|
| backward | -Backward pass computing gradients for LoRA output projection. | -
| forward | -Forward pass for output projection with LoRA. | -
kernels.lora.LoRA_O.backward(ctx, dY)Backward pass computing gradients for LoRA output projection.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -torch.autograd.function.FunctionCtx | -Autograd context | -required | -
| dY | -torch.Tensor | -Gradient of loss with respect to output | -required | -
| Name | -Type | -Description | -
|---|---|---|
| - | tuple[torch.Tensor, None, None, None, torch.Tensor, torch.Tensor, None] | -Tuple containing gradients for all forward inputs | -
kernels.lora.LoRA_O.forward(ctx, X, W, b, W_quant, A, B, s)Forward pass for output projection with LoRA.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -torch.autograd.function.FunctionCtx | -Autograd context | -required | -
| X | -torch.Tensor | -Input tensor | -required | -
| W | -torch.Tensor | -Output projection weight | -required | -
| b | -torch.Tensor | -Output projection bias | -required | -
| W_quant | -QuantState | None | -Weight quantization state | -required | -
| A | -torch.Tensor | -LoRA A matrix | -required | -
| B | -torch.Tensor | -LoRA B matrix | -required | -
| s | -float | -LoRA scaling factor | -required | -
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | -Output projection result | -
Supports bias, dropout, and DoRA.
kernels.lora.LoRA_QKV()kernels.lora.LoRA_QKV()Optimized LoRA QKV implementation with quantization support.
-Implements efficient computation of query, key, value projections with LoRA, -supporting quantization and memory optimization.
-| Name | -Description | -
|---|---|
| backward | -Backward pass computing gradients for LoRA QKV. | -
| forward | -Forward pass computing Q, K, V projections with LoRA. | -
kernels.lora.LoRA_QKV.backward(ctx, q_grad, k_grad, v_grad)Backward pass computing gradients for LoRA QKV.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -torch.autograd.function.FunctionCtx | -Autograd context | -required | -
| q_grad | -torch.Tensor | -Gradient for query projection | -required | -
| k_grad | -torch.Tensor | -Gradient for key projection | -required | -
| v_grad | -torch.Tensor | -Gradient for value projection | -required | -
| Name | -Type | -Description | -
|---|---|---|
| - | tuple[torch.Tensor, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None] | -Tuple containing gradients for all forward inputs | -
kernels.lora.LoRA_QKV.forward(
- ctx,
- X,
- q_weight,
- q_bias,
- q_quant,
- q_A,
- q_B,
- q_scale,
- k_weight,
- k_bias,
- k_quant,
- k_A,
- k_B,
- k_scale,
- v_weight,
- v_bias,
- v_quant,
- v_A,
- v_B,
- v_scale,
- inplace=True,
-)Forward pass computing Q, K, V projections with LoRA.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| ctx | -torch.autograd.function.FunctionCtx | -Autograd context | -required | -
| X | -torch.Tensor | -Input tensor | -required | -
| q_weight | -torch.Tensor | -Query projection weight | -required | -
| q_bias | -torch.Tensor | None | -Query projection bias | -required | -
| q_quant | -QuantState | None | -Query quantization state | -required | -
| q_A | -torch.Tensor | None | -Query LoRA A matrix | -required | -
| q_B | -torch.Tensor | None | -Query LoRA B matrix | -required | -
| q_scale | -float | -Query LoRA scale | -required | -
| k_weight | -torch.Tensor | -Key projection weight | -required | -
| k_bias | -torch.Tensor | None | -Key projection bias | -required | -
| k_quant | -QuantState | None | -Key quantization state | -required | -
| k_A | -torch.Tensor | None | -Key LoRA A matrix | -required | -
| k_B | -torch.Tensor | None | -Key LoRA B matrix | -required | -
| k_scale | -float | -Key LoRA scale | -required | -
| v_weight | -torch.Tensor | -Value projection weight | -required | -
| v_bias | -torch.Tensor | None | -Value projection bias | -required | -
| v_quant | -QuantState | None | -Value quantization state | -required | -
| v_A | -torch.Tensor | None | -Value LoRA A matrix | -required | -
| v_B | -torch.Tensor | None | -Value LoRA B matrix | -required | -
| v_scale | -float | -Value LoRA scale | -required | -
| inplace | -bool | -Whether to perform operations in-place | -True |
-
| Name | -Type | -Description | -
|---|---|---|
| - | tuple[torch.Tensor, torch.Tensor, torch.Tensor] | -Tuple of (Query, Key, Value) projection tensors | -
Supports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation). +Dropout is applied outside this Function so autograd handles its backward.
kernels.lora.apply_lora_embedding(self, x)Applies LoRA to embedding layer.
+kernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)kernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)Applies LoRA to MLP layer with GEGLU activation.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| X | -torch.Tensor | -Input tensor for the MLP layer | -required | -
| inplace | -bool | -Whether to perform operations in-place to save memory | -True |
-
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | -Output tensor after applying LoRA-adapted MLP with GEGLU activation | -
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)kernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)Applies LoRA to MLP layer with SwiGLU activation.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| X | -torch.Tensor | -Input tensor for the MLP layer | -required | -
| inplace | -bool | -Whether to perform operations in-place to save memory | -True |
-
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | -Output tensor after applying LoRA-adapted MLP with SwiGLU activation | -
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_o(self, X)kernels.lora.apply_lora_o(self, X)Applies LoRA to output projection layer.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| X | -torch.Tensor | -Input tensor | -required | -
| Name | -Type | -Description | -
|---|---|---|
| - | torch.Tensor | -Transformed output tensor | -
Supports bias, dropout, and DoRA.
kernels.lora.apply_lora_qkv(self, X, inplace=True)kernels.lora.apply_lora_qkv(self, X, inplace=True)Applies LoRA to compute Query, Key, Value projections.
-| Name | -Type | -Description | -Default | -
|---|---|---|---|
| X | -torch.Tensor | -Input tensor | -required | -
| inplace | -bool | -Whether to perform operations in-place | -True |
-
| Name | -Type | -Description | -
|---|---|---|
| - | tuple[torch.Tensor, torch.Tensor, torch.Tensor] | -Tuple of (Query, Key, Value) projection tensors | -
Supports bias, dropout, and DoRA. Dropout is applied outside the autograd +Function so PyTorch handles its backward automatically. A single shared +dropout mask is used across Q, K, V projections for memory efficiency.
kernels.lora.get_embedding_lora_parameters(embed)Extract LoRA parameters from a PEFT Embedding module.
kernels.lora.get_lora_parameters(proj)kernels.lora.get_lora_parameters(proj)Gets LoRA parameters from a projection module.
-| torch.Tensor | -A tuple containing the base weights, quantization state, LoRA A and B weights, | +A tuple containing: | |
| torch.Tensor | None | -scaling factor, and base layer bias. Quant state, weights, and bias may be | +- W: base weight tensor | |
| QuantState | torch.Tensor | None | -None if not available. |
+- b: base layer bias (or None) | +|
| + | torch.Tensor | None | +- quant_state: quantization state (or None) | +|
| + | torch.Tensor | None | +- A: LoRA A weight (or None) | +|
| + | float | None | +- B: LoRA B weight (or None) | +|
| + | torch.Tensor | None | +- s: LoRA scaling factor (or None) | +|
| + | nn.Module | None | +- lora_bias: LoRA B bias (or None) | +|
| + | torch.Tensor | None | +- dropout: dropout module (or None) | +|
| + | tuple[torch.Tensor, torch.Tensor | None, QuantState | torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, float | None, torch.Tensor | None, nn.Module | None, torch.Tensor | None] | +- magnitude: DoRA magnitude vector (or None) |
kernels.lora.matmul_lora(X, W, b, W_quant, A, B, s, out=None)kernels.lora.matmul_lora(
+ X,
+ W,
+ b,
+ W_quant,
+ A,
+ B,
+ s,
+ out=None,
+ X_drop=None,
+ lora_bias=None,
+)Efficient fused matmul + LoRA computation.
-| Optional output tensor for inplace operations | None |
||
| X_drop | +torch.Tensor | None | +Optional dropout-applied input for LoRA path (if None, uses X) | +None |
+
| lora_bias | +torch.Tensor | None | +Optional LoRA B layer bias [out_features] | +None |
+
| Name | @@ -2029,7 +1141,7 @@ supporting quantization and memory optimization.|||
|---|---|---|---|
| torch.Tensor | -Result of X @ W + X @ A @ B | +Result of X @ W + s * X_drop @ A @ B + b + s * lora_bias |