* Add StableLM examples and configurations * Add FFT and LORA configuration files and modify readme with usage
37 lines
1.5 KiB
Markdown
37 lines
1.5 KiB
Markdown
# StableLM 2
|
|
|
|
This repository contains examples for training and processing using StableLM-2. It also includes a section to help you estimate the GPU requirements for your specific use case.
|
|
|
|
## Estimating GPU Requirements
|
|
|
|
| type | deepspeed | batch size | context length | vRAM GPU (GBs) |
|
|
|---------------|-----------|------------|----------------|----------------|
|
|
| full finetune | N/A | 1 | 4096 | ~21.5GBs |
|
|
| full finetune | zero2 | 1 | 4096 | ~20GBs |
|
|
| lora | N/A | 1 | 4096 | ~16.6GBs |
|
|
|
|
The above are estimates and might differ slight depending on the setup for example whether you pack your sequence lengths or not (the above assumes you do to length 4096).
|
|
|
|
This blog post from Hamel Husain was a great resource for estimating these numbers: https://hamel.dev/notes/llm/03_estimating_vram.html
|
|
|
|
## Training
|
|
We have example scripts here for both full finetuning and lora using the popular alpaca dataset:
|
|
|
|
```shell
|
|
# preprocess the dataset
|
|
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/stablelm-2/1.6b/lora.yml
|
|
```
|
|
|
|
Single GPU Training:
|
|
```shell
|
|
python -m axolotl.cli.train examples/stablelm-2/fft.yml --deepspeed deepspeed_configs/zero2.json
|
|
# OR
|
|
python -m axolotl.cli.train examples/stablelm-2/1.6b/lora.yml
|
|
```
|
|
|
|
Multinode GPU Training with `accelerate`:
|
|
```shell
|
|
# make sure you've configured accelerate properly
|
|
accelerate launch -m axolotl.cli.train examples/stablelm-2/1.6b/fft.yml --deepspeed deepspeed_configs/zero2.json
|
|
```
|