chore(docs): add cookbook/blog link to docs (#2410) [skip ci]

This commit is contained in:
NanoCode012
2025-03-17 19:38:19 +07:00
committed by GitHub
parent 4f5eb42a73
commit 7235123d44
3 changed files with 12 additions and 0 deletions

View File

@@ -66,6 +66,10 @@ logic to be compatible with more of them.
</details> </details>
::: {.callout-tip}
Check out our [LoRA optimizations blog](https://axolotlai.substack.com/p/accelerating-lora-fine-tuning-with).
:::
## Usage ## Usage
These optimizations can be enabled in your Axolotl config YAML file. The These optimizations can be enabled in your Axolotl config YAML file. The

View File

@@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f
### Process Reward Models (PRM) ### Process Reward Models (PRM)
::: {.callout-tip}
Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
:::
Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning. Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
```yaml ```yaml
base_model: Qwen/Qwen2.5-3B base_model: Qwen/Qwen2.5-3B

View File

@@ -497,6 +497,10 @@ The input format is a simple JSON input with customizable fields based on the ab
### GRPO ### GRPO
::: {.callout-tip}
Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/axolotl-cookbook/tree/main/grpo#training-an-r1-style-large-language-model-using-grpo).
:::
GRPO uses custom reward functions and transformations. Please have them ready locally. GRPO uses custom reward functions and transformations. Please have them ready locally.
For ex, to load OpenAI's GSM8K and use a random reward for completions: For ex, to load OpenAI's GSM8K and use a random reward for completions: