Implement fused modules (#747)

* MLP: Memory saving

* Remove RMSNorm restrictions

* Map packed weights to original

* FusedAttention module

* Simplify code

* Move fused modules

* Fix critical typo

* Split inplace

* Add FFT config

* Add validation of fused arguments

* Add fused arguments to config

* Update docs

* Fix validation logic

* Add fused modules to flash attn

* Only fuse during training

* Remove timing

* Formatting

* Formatting

* Formatting

* chore: lint

* chore: lint

* add e2e tests for fused llama

* no lora for tests

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
This commit is contained in:
Casper
2023-10-21 22:08:25 +02:00
committed by GitHub
parent a21935f07a
commit 15d3a654bf
10 changed files with 365 additions and 13 deletions

View File

@@ -9,12 +9,16 @@ gradient_accumulation_steps: 2
micro_batch_size: 1
```shell
accelerate launch scripts/finetune.py examples/llama-2/qlora.yml
accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml
```
or
```shell
accelerate launch scripts/finetune.py examples/llama-2/lora.yml
accelerate launch -m axolotl.cli.train examples/llama-2/lora.yml
```
To launch a full finetuning with 16-bit precision:
```shell
accelerate launch -m axolotl.cli.train examples/llama-2/fft_optimized.yml
```