* adding model_cfg to set num_labels * using a num_labels field instead * linting * WIP stepwise prompt tokenizer * this should work? * trainer working? * pushing to runpod * fixing saving * updating conf * updating config, adding docs * adding stepwise supervision docpage * updating tests * adding test for dataset * fixing tests * linting * addressing some comments * adding additional cfg fields support * updating tests, fixing cfg * fixing tests * updating loss * Update test_process_reward_model_smollm2.py * updating loss values and seed * dumb pre-commit
19 lines
674 B
Plaintext
19 lines
674 B
Plaintext
---
|
|
title: Stepwise Supervised Format
|
|
description: Format for datasets with stepwise completions and labels
|
|
order: 3
|
|
---
|
|
|
|
## Stepwise Supervised
|
|
|
|
The stepwise supervised format is designed for chain-of-thought (COT) reasoning datasets where each example contains multiple completion steps and a preference label for each step.
|
|
### ExampleHere's a simple example of a stepwise supervised dataset entry:```json
|
|
{
|
|
"prompt": "Which number is larger, 9.8 or 9.11?",
|
|
"completions": [
|
|
"The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.",
|
|
"Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."
|
|
],
|
|
"labels": [true, false]
|
|
}
|