* pad_to_worst_case_seq_len boolean, for testing memory limits * remove collator_pad_to_longest option since it does nothing see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding True and "longest" mean the same thing * rename to `pad_to_sequence_len, and ensure 64 alignment --------- Co-authored-by: Aman Karmani <aman@tmm1.net>
Pythia 12B
- Single-GPU A100 only (?)
python scripts/finetune.py examples/pythia-12b/config.yml
⚠️ Multiple-GPU A100 - Doesn't seem to work with multi-gpu without causing OOM! ⚠️