chore(docs): add cookbook/blog link to docs (#2410) [skip ci]

This commit is contained in:
NanoCode012
2025-03-17 19:38:19 +07:00
committed by GitHub
parent 4f5eb42a73
commit 7235123d44
3 changed files with 12 additions and 0 deletions

View File

@@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f
### Process Reward Models (PRM)
::: {.callout-tip}
Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
:::
Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
```yaml
base_model: Qwen/Qwen2.5-3B