Switch to parallel FFD bin packing algorithm. (#1619)

* Switch to parallel FFD bin packing algorithm.

Add support for packing in a distributed context.
Add packing efficiency estimate back.

* revert changes to distributed code

* chore: lint

* fix config w new params for packing test

* add sample_packing_group_size and sample_packing_bin_size to cfg schema

* fix lamdbda function

* fix sampler/dataloader calculations for packing

---------

Co-authored-by: dsesclei <dave@sescleifer.com>
This commit is contained in:
Wing Lian
2024-05-23 17:32:14 -04:00
committed by GitHub
parent bbfed318bc
commit 367b2e879b
8 changed files with 175 additions and 225 deletions

View File

@@ -42,6 +42,8 @@ class TestPretrainingPacking(unittest.TestCase):
"pad_to_sequence_len": True,
"sequence_len": 2048,
"micro_batch_size": 2,
"sample_packing_group_size": 100000,
"sample_packing_bin_size": 200,
}
)