pad_to_worst_case_seq_len boolean, for testing memory limits (#498)

* pad_to_worst_case_seq_len boolean, for testing memory limits

* remove collator_pad_to_longest option since it does nothing

see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding

True and "longest" mean the same thing

* rename to `pad_to_sequence_len, and ensure 64 alignment

---------

Co-authored-by: Aman Karmani <aman@tmm1.net>
This commit is contained in:
Birch-san
2023-08-28 23:47:16 +01:00
committed by GitHub
parent 267b7b24e5
commit 8e197f6fb4
3 changed files with 6 additions and 7 deletions

View File

@@ -585,10 +585,10 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer, total_num_
callbacks.append(SaveBetterTransformerModelCallback)
data_collator_kwargs = {
"padding": True,
"padding": True, # True/"longest" is the default
}
if cfg.collator_pad_to_longest:
data_collator_kwargs["padding"] = "longest"
if cfg.pad_to_sequence_len:
data_collator_kwargs["pad_to_multiple_of"] = 64 * round(cfg.sequence_len / 64)
else:
# A100 is best at 64, while others at 8. Let's use the larger so we don't have to check
# https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html