* prepared dataset caching, other misc fixes * also don't load from disk cache unless explicit
Pythia 12B
- Single-GPU A100 only (?)
python scripts/finetune.py examples/pythia-12b/config.yml
⚠️ Multiple-GPU A100 - Doesn't seem to work with multi-gpu without causing OOM! ⚠️