chore(docs): add cookbook/blog link to docs (#2410) [skip ci]

2025-03-17 19:38:19 +07:00
parent 4f5eb42a73
commit 7235123d44
3 changed files with 12 additions and 0 deletions
--- a/docs/reward_modelling.qmd
+++ b/docs/reward_modelling.qmd
@@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f

 ### Process Reward Models (PRM)

+::: {.callout-tip}
+Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
+:::
+
 Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
 ```yaml
 base_model: Qwen/Qwen2.5-3B