diff --git a/.nojekyll b/.nojekyll index 50e4ddf8e..7052830ab 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -d0e306c9 \ No newline at end of file +463de0cb \ No newline at end of file diff --git a/docs/api/kernels.lora.html b/docs/api/kernels.lora.html index e858b9c75..afeaeefac 100644 --- a/docs/api/kernels.lora.html +++ b/docs/api/kernels.lora.html @@ -639,11 +639,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
None for weights/quantization statesNone for weights/biases/quantization statesNone)None for activation functions and flagsForward pass for LoRA MLP.
| - | tuple[torch.Tensor, None, None, torch.Tensor | None, torch.Tensor | None, None] | +tuple[torch.Tensor, None, None, None, torch.Tensor, torch.Tensor, None] | Tuple containing gradients for all forward inputs | required |
| b | +torch.Tensor | +Output projection bias | +required | +|
| W_quant | QuantState | None | Weight quantization state | required | |
| A | -torch.Tensor | None | +torch.Tensor | LoRA A matrix | required |
| B | -torch.Tensor | None | +torch.Tensor | LoRA B matrix | required |
| S | +||||
| s | float | LoRA scaling factor | required | @@ -1020,7 +1041,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});|
| torch.Tensor | -Output projection tensor | +Output projection result |
| - | tuple[torch.Tensor, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None] | +tuple[torch.Tensor, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None, None, None, torch.Tensor | None, torch.Tensor | None, None, None] | Tuple containing gradients for all forward inputs | required |
| q_bias | +torch.Tensor | None | +Query projection bias | +required | +|
| q_quant | QuantState | None | Query quantization state | required | |
| q_A | torch.Tensor | None | Query LoRA A matrix | required | |
| q_B | torch.Tensor | None | Query LoRA B matrix | required | |
| q_scale | float | Query LoRA scale | required | |
| k_weight | torch.Tensor | Key projection weight | required | |
| k_bias | +torch.Tensor | None | +Key projection bias | +required | +|
| k_quant | QuantState | None | @@ -1248,30 +1284,36 @@ supporting quantization and memory optimization.required | ||
| v_bias | +torch.Tensor | None | +Value projection bias | +required | +|
| v_quant | QuantState | None | Value quantization state | required | |
| v_A | torch.Tensor | None | Value LoRA A matrix | required | |
| v_B | torch.Tensor | None | Value LoRA B matrix | required | |
| v_scale | float | Value LoRA scale | required | |
| inplace | bool | Whether to perform operations in-place | @@ -1625,17 +1667,17 @@ supporting quantization and memory optimization.||
| torch.Tensor | -A tuple containing the base weight matrix, quantization state, LoRA A matrix, | +A tuple containing the base weights, quantization state, LoRA A and B weights, | ||
| - | QuantState | None | -LoRA B matrix, and scaling factor. States and matrices may be None if not | +torch.Tensor | None | +scaling factor, and base layer bias. Quant state, weights, and bias may be |
| - | torch.Tensor | None | -available. | +QuantState | None | +None if not available. |
kernels.lora.matmul_lora(X, W, W_quant, A, B, s, out=None)kernels.lora.matmul_lora(X, W, b, W_quant, A, B, s, out=None)Efficient fused matmul + LoRA computation.