core.trainers.grpo.sampler
+core.trainers.grpo.sampler
Repeat random sampler (similar to the one implemented in +https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py) that adds +sequence parallelism functionality; i.e., duplicating data across ranks in the same +sequence parallel group.
+Classes
+| Name | +Description | +
|---|---|
| SequenceParallelRepeatRandomSampler | +Sampler for GRPO training with sequence parallelism. | +
SequenceParallelRepeatRandomSampler
+core.trainers.grpo.sampler.SequenceParallelRepeatRandomSampler(
+ self,
+ dataset,
+ mini_repeat_count,
+ world_size,
+ rank,
+ batch_size=1,
+ repeat_count=1,
+ sequence_parallel_degree=1,
+ shuffle=True,
+ seed=0,
+ drop_last=False,
+)Sampler for GRPO training with sequence parallelism.
+This sampler ensures: +- Ranks in the same sequence parallel (SP) group receive identical data. +- Each index is repeated multiple times for sampling different completions. +- Entire batches are repeated for reuse in multiple updates. +- Data is properly distributed across SP groups.
+In the table below, the values represent dataset indices. Each SP group has
+sequence_parallel_degree = 2 GPUs working together on the same data. There are 2
+SP groups (SP0 and SP1), with world_size = 4 total GPUs.
Sequence Parallel Groups
+ | SP0 | SP1 |
+ | GPU 0 | GPU 1 | GPU 2 | GPU 3 |
+ global_step step <---> mini_repeat_count=3
+ <----------> batch_size=2 per SP group
+grad_accum=2 ▲ ▲ 0 0 [0 0 0 1 1 1] [2 2 2 3 3 3] <- SP groups get different data +▼ | 0 1 [0 0 0 1 1 1] [2 2 2 3 3 3] <- Same data for each SP group GPU +| +| 1 2 [0 0 0 1 1 1] [2 2 2 3 3 3] <- Repeat same indices for iterations +num_iterations=2 ▼ 1 3 [0 0 0 1 1 1] [2 2 2 3 3 3] <- When using gradient accumulation
+ 2 4 [4 4 4 5 5 5] [6 6 6 7 7 7] <- New batch of data indices
+ 2 5 [4 4 4 5 5 5] [6 6 6 7 7 7]
+ ...
+Parameters
+| Name | +Type | +Description | +Default | +
|---|---|---|---|
| dataset | +Sized | +Dataset to sample from. | +required | +
| mini_repeat_count | +int | +How many times to repeat each sample immediately. | +required | +
| world_size | +int | +Total number of processes. | +required | +
| rank | +int | +Rank of current process. | +required | +
| batch_size | +int | +Number of samples per batch. | +1 |
+
| repeat_count | +int | +How many times to repeat the full sampling process. | +1 |
+
| sequence_parallel_degree | +int | +Number of ranks in a sequence parallel group. | +1 |
+
| shuffle | +bool | +Whether to shuffle the dataset. | +True |
+
| seed | +int | +Random seed for shuffling. | +0 |
+
| drop_last | +bool | +Whether to drop the last incomplete batch. | +False |
+
Methods
+| Name | +Description | +
|---|---|
| set_epoch | +Sets the epoch for this sampler. | +
set_epoch
+core.trainers.grpo.sampler.SequenceParallelRepeatRandomSampler.set_epoch(epoch)Sets the epoch for this sampler.
+Parameters
+| Name | +Type | +Description | +Default | +
|---|---|---|---|
| epoch | +int | +Epoch number to use for shuffling. | +required | +