doc fix (#3187)
This commit is contained in:
@@ -5,10 +5,11 @@ description: "Custom autograd functions and Triton kernels in Axolotl for optimi
|
||||
|
||||
Inspired by [Unsloth](https://github.com/unslothai/unsloth), we've implemented two
|
||||
optimizations for LoRA and QLoRA fine-tuning, supporting both single GPU and multi-GPU
|
||||
(in the DDP and DeepSpeed settings) training. These include (1) SwiGLU and GEGLU activation function
|
||||
Triton kernels, and (2) LoRA MLP and attention custom autograd functions. Our goal was
|
||||
to leverage operator fusion and tensor re-use in order to improve speed and reduce
|
||||
memory usage during the forward and backward passes of these calculations.
|
||||
(including the DDP, DeepSpeed, and FSDP2 settings) training. These include (1) SwiGLU
|
||||
and GEGLU activation function Triton kernels, and (2) LoRA MLP and attention custom
|
||||
autograd functions. Our goal was to leverage operator fusion and tensor re-use in order
|
||||
to improve speed and reduce memory usage during the forward and backward passes of
|
||||
these calculations.
|
||||
|
||||
We currently support several common model architectures, including (but not limited to):
|
||||
|
||||
@@ -131,6 +132,5 @@ computation path.
|
||||
## Future Work
|
||||
|
||||
- Support for additional model architectures
|
||||
- Support for the FSDP setting
|
||||
- Support for dropout and bias
|
||||
- Additional operator fusions
|
||||
|
||||
Reference in New Issue
Block a user