add docs and tweak yml
This commit is contained in:
36
docs/llava.md
Normal file
36
docs/llava.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# LLaVA
|
||||
|
||||
### Installing dependencies
|
||||
|
||||
```shell
|
||||
git clone https://github.com/haotian-liu/LLaVA.git
|
||||
cd LLaVA
|
||||
pip install --no-deps -e .
|
||||
```
|
||||
|
||||
### Downloading assets
|
||||
|
||||
LLaVA doesn't support remote datasets, so both the JSON and image assets need to be downloaded locally
|
||||
|
||||
```shell
|
||||
mkdir llava
|
||||
mkdir data
|
||||
cd llava
|
||||
curl -L -O https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/resolve/main/images.zip
|
||||
unzip images.zip
|
||||
|
||||
cd ../data
|
||||
curl -L -O https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/resolve/main/blip_laion_cc_sbu_558k.json
|
||||
```
|
||||
|
||||
### Pretraining
|
||||
|
||||
Pretraining aligns the vision model with the language model.
|
||||
|
||||
```shell
|
||||
accelerate launch -m axolotl.cli.train_mm examples/multimodal/pretrain-llava-llama.yml
|
||||
```
|
||||
|
||||
### Finetuning
|
||||
|
||||
TBD
|
||||
Reference in New Issue
Block a user