* add bitnet * switch to uv * chore: liint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>
71 lines
2.0 KiB
Plaintext
71 lines
2.0 KiB
Plaintext
---
|
|
title: "1.58-bit Finetuning"
|
|
back-to-top-navigation: true
|
|
toc: true
|
|
toc-expand: 2
|
|
toc-depth: 4
|
|
---
|
|
|
|
## Overview
|
|
|
|
1.58-bit finetuning allows you to finetune BitNet models when their prequantized weights are provided. In theory, it will be possible to fine-tune any LLM in 1.58bit format but the performance degradation will be dramatic.
|
|
|
|
Axolotl supports 1.58-bit finetuning via the [`onebitllms`](https://github.com/tiiuae/onebitllms) library, which replaces standard linear layers with BitNet-compatible counterparts ready to use for training.
|
|
|
|
::: {.callout-note}
|
|
LoRA is not supported for BitNet models
|
|
:::
|
|
|
|
## Installation
|
|
|
|
Install the `onebitllms` package before using this feature:
|
|
|
|
```bash
|
|
uv pip install onebitllms
|
|
```
|
|
|
|
Or from source:
|
|
|
|
```bash
|
|
uv pip install git+https://github.com/tiiuae/onebitllms
|
|
```
|
|
|
|
## Supported models
|
|
|
|
For now, only `Falcon-E` series of models are supported. Make sure to use their `-prequantized` version:
|
|
|
|
```bash
|
|
tiiuae/Falcon-E-3B-Base-prequantized
|
|
tiiuae/Falcon-E-1B-Base-prequantized
|
|
```
|
|
|
|
In theory, any other model would 'work' but the performance degradation will be huge. This remains an area of exploration.
|
|
|
|
## Configuration
|
|
|
|
To enable 1.58-bit finetuning, set the following in your configuration file:
|
|
|
|
```yaml
|
|
base_model: tiiuae/Falcon-E-3B-Base-prequantized # A BitNet-compatible model
|
|
|
|
use_onebitllms: true
|
|
```
|
|
|
|
::: {.callout-note}
|
|
For BitNet models, it is recommended to use a higher learning rate than classic models (usually in the order of magnitude of 10x).
|
|
:::
|
|
|
|
## Considerations after training
|
|
|
|
Once your model has been trained with 1.58bit fine-tuning, you can convert the trained model in ternary format using the `onebitllms` CLI:
|
|
|
|
```bash
|
|
onebitllms quantize_to_1bit INPUT_PATH OUTPUT_PATH
|
|
```
|
|
|
|
After that, you can use supported packages such as `llama.cpp` or Apple MLX package to run the trained model.
|
|
|
|
## Example Configuration
|
|
|
|
You can find example configurations in `examples/falcon-e` which contain one configuration for SFT and one configuration for DPO.
|