limit num_proc when saving datasets to disk (#2948) [skip ci]

* limit num_proc when saving datasets to disk

* enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save

* update fixtures with dataset processes since that should never be NoneType

* improve reusability for tests
This commit is contained in:
Wing Lian
2025-07-21 11:39:38 -04:00
committed by GitHub
parent 8e5f146701
commit db5f6f4693
7 changed files with 27 additions and 9 deletions

View File

@@ -210,6 +210,7 @@ class TestDeduplicateRLDataset:
ALPACA_MESSAGES_CONFIG_REVISION,
ALPACA_MESSAGES_CONFIG_REVISION,
],
"dataset_processes": 4,
}
)
yield fixture