From a3c82e8cbbe4f8285a7ebb40891cd9289b087ece Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Fri, 13 Jun 2025 12:03:47 -0700 Subject: [PATCH] fix: grpo doc link (#2788) [skip ci] --- docs/rlhf.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rlhf.qmd b/docs/rlhf.qmd index b2687a8f9..0a189e3c1 100644 --- a/docs/rlhf.qmd +++ b/docs/rlhf.qmd @@ -500,7 +500,7 @@ The input format is a simple JSON input with customizable fields based on the ab ### GRPO ::: {.callout-tip} -Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/axolotl-cookbook/tree/main/grpo#training-an-r1-style-large-language-model-using-grpo). +Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/grpo_code). ::: In the latest GRPO implementation, `vLLM` is used to significantly speedup trajectory generation during training. In this example, we're using 4 GPUs - 2 for training, and 2 for vLLM: