utils.samplers.multipack
utils.samplers.multipack
Multipack Batch Sampler
Classes
| Name | Description |
|---|---|
| MultipackBatchSampler | Batch sampler class for multipack |
MultipackBatchSampler
utils.samplers.multipack.MultipackBatchSampler(
self,
sampler,
batch_size,
batch_max_len,
lengths,
packing_efficiency_estimate=1.0,
drop_last=False,
num_count_samples=16,
sequential=False,
**kwargs,
)Batch sampler class for multipack
Functions
| Name | Description |
|---|---|
| allocate_sequentially | Sequential allocator that preserves example order |
allocate_sequentially
utils.samplers.multipack.allocate_sequentially(lengths, rank, c, n)Sequential allocator that preserves example order
Parameters: - lengths: The lengths of all examples - rank: The current rank (for distributed training) - c: The capacity of each bin (maximum sequence length) - n: Number of ranks
Returns: - result: List of batches for the current rank - total_used: Number of actual example tokens - total_slots: Maximum theoretical number of example tokens (number of bins * bin capacity)