Add image. Add quickstart. Simplify dataset.
This commit is contained in:
70
README.md
70
README.md
@@ -1,10 +1,18 @@
|
|||||||
# Axolotl
|
# Axolotl
|
||||||
|
|
||||||
A centralized repo to train multiple architectures with different dataset types using a simple yaml file.
|
<div align="center">
|
||||||
|
<img src="image/axolotl.png" alt="axolotl" width="160">
|
||||||
|
<div>
|
||||||
|
<p>
|
||||||
|
<b>One repo to finetune them all! </b>
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Go ahead and axolotl questions!!
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
Go ahead and axolotl questions!!
|
## Axolotl supports
|
||||||
|
|
||||||
## Support Matrix
|
|
||||||
|
|
||||||
| | fp16/fp32 | fp16/fp32 w/ lora | 4bit-quant | 4bit-quant w/flash attention | flash attention | xformers attention |
|
| | fp16/fp32 | fp16/fp32 w/ lora | 4bit-quant | 4bit-quant w/flash attention | flash attention | xformers attention |
|
||||||
|----------|:----------|:------------------|------------|------------------------------|-----------------|--------------------|
|
|----------|:----------|:------------------|------------|------------------------------|-----------------|--------------------|
|
||||||
@@ -14,7 +22,22 @@ Go ahead and axolotl questions!!
|
|||||||
| mpt | ✅ | ❌ | ❌ | ❌ | ❌ | ❓ |
|
| mpt | ✅ | ❌ | ❌ | ❌ | ❌ | ❓ |
|
||||||
|
|
||||||
|
|
||||||
## Getting Started
|
## Quick start
|
||||||
|
|
||||||
|
**Requirements**: Python 3.9.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/OpenAccess-AI-Collective/axolotl
|
||||||
|
|
||||||
|
pip3 install -e .[int4]
|
||||||
|
|
||||||
|
accelerate config
|
||||||
|
accelerate launch scripts/finetune.py examples/4bit-lora-7b/config.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Requirements and Installation
|
||||||
|
|
||||||
### Environment
|
### Environment
|
||||||
|
|
||||||
@@ -39,6 +62,23 @@ Go ahead and axolotl questions!!
|
|||||||
|
|
||||||
Have dataset(s) in one of the following format (JSONL recommended):
|
Have dataset(s) in one of the following format (JSONL recommended):
|
||||||
|
|
||||||
|
- `alpaca`: instruction; input(optional)
|
||||||
|
```json
|
||||||
|
{"instruction": "...", "input": "...", "output": "..."}
|
||||||
|
```
|
||||||
|
- `sharegpt`: conversations
|
||||||
|
```json
|
||||||
|
{"conversations": [{"from": "...", "value": "..."}]}
|
||||||
|
```
|
||||||
|
- `completion`: raw corpus
|
||||||
|
```json
|
||||||
|
{"text": "..."}
|
||||||
|
```
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
<summary>See all formats</summary>
|
||||||
|
|
||||||
- `alpaca`: instruction; input(optional)
|
- `alpaca`: instruction; input(optional)
|
||||||
```json
|
```json
|
||||||
{"instruction": "...", "input": "...", "output": "..."}
|
{"instruction": "...", "input": "...", "output": "..."}
|
||||||
@@ -68,11 +108,13 @@ Have dataset(s) in one of the following format (JSONL recommended):
|
|||||||
{"text": "..."}
|
{"text": "..."}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
Optionally, download some datasets, see [data/README.md](data/README.md)
|
Optionally, download some datasets, see [data/README.md](data/README.md)
|
||||||
|
|
||||||
### Config
|
### Config
|
||||||
|
|
||||||
See sample configs in [configs](configs) folder. It is recommended to duplicate and modify to your needs. The most important options are:
|
See sample configs in [configs](configs) folder or [examples](examples) for quick start. It is recommended to duplicate and modify to your needs. The most important options are:
|
||||||
|
|
||||||
- model
|
- model
|
||||||
```yaml
|
```yaml
|
||||||
@@ -84,7 +126,7 @@ See sample configs in [configs](configs) folder. It is recommended to duplicate
|
|||||||
```yaml
|
```yaml
|
||||||
datasets:
|
datasets:
|
||||||
- path: vicgalle/alpaca-gpt4 # local or huggingface repo
|
- path: vicgalle/alpaca-gpt4 # local or huggingface repo
|
||||||
type: alpaca # format from above
|
type: alpaca # format from earlier
|
||||||
```
|
```
|
||||||
|
|
||||||
- loading
|
- loading
|
||||||
@@ -147,6 +189,8 @@ datasets:
|
|||||||
- path: vicgalle/alpaca-gpt4
|
- path: vicgalle/alpaca-gpt4
|
||||||
# The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
|
# The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
|
||||||
type: alpaca
|
type: alpaca
|
||||||
|
data_files: # path to source data files
|
||||||
|
|
||||||
# axolotl attempts to save the dataset as an arrow after packing the data together so
|
# axolotl attempts to save the dataset as an arrow after packing the data together so
|
||||||
# subsequent training attempts load faster, relative path
|
# subsequent training attempts load faster, relative path
|
||||||
dataset_prepared_path: data/last_run_prepared
|
dataset_prepared_path: data/last_run_prepared
|
||||||
@@ -260,7 +304,13 @@ debug:
|
|||||||
|
|
||||||
### Accelerate
|
### Accelerate
|
||||||
|
|
||||||
Configure accelerate using `accelerate config` or update `~/.cache/huggingface/accelerate/default_config.yaml`
|
Configure accelerate
|
||||||
|
|
||||||
|
```bash
|
||||||
|
accelerate config
|
||||||
|
|
||||||
|
# nano ~/.cache/huggingface/accelerate/default_config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
### Train
|
### Train
|
||||||
|
|
||||||
@@ -275,10 +325,10 @@ Add `--inference` flag to train command above
|
|||||||
|
|
||||||
If you are inferencing a pretrained LORA, pass
|
If you are inferencing a pretrained LORA, pass
|
||||||
```bash
|
```bash
|
||||||
--lora_model_dir path/to/lora
|
--lora_model_dir ./completed-model
|
||||||
```
|
```
|
||||||
|
|
||||||
### Merge LORA to base
|
### Merge LORA to base (Dev branch 🔧 )
|
||||||
|
|
||||||
Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
|
Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user