Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-03-21 17:30:33 +00:00
parent 486fc53c93
commit 127f9229b5
171 changed files with 127099 additions and 1001 deletions

View File

@@ -178,7 +178,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../docs/cli.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">CLI Reference</span></a>
<span class="menu-text">Command Line Interface (CLI)</span></a>
</div>
</li>
<li class="sidebar-item">
@@ -186,6 +186,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<a href="../docs/config.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Config Reference</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../docs/api" class="sidebar-item-text sidebar-link">
<span class="menu-text">API Reference</span></a>
</div>
</li>
</ul>
</li>
@@ -462,7 +468,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<section id="overview" class="level3">
<h3 class="anchored" data-anchor-id="overview">Overview</h3>
<p>Reward modelling is a technique used to train models to predict the reward or value of a given input. This is particularly useful in reinforcement learning scenarios where the model needs to evaluate the quality of its actions or predictions. We support the reward modelling techniques supported by <code>trl</code>.</p>
<p>Reward modelling is a technique used to train models to predict the reward or value of a given input. This is particularly useful in reinforcement learning scenarios where the model needs to evaluate the quality of its actions or predictions.
We support the reward modelling techniques supported by <code>trl</code>.</p>
</section>
<section id="outcome-reward-models" class="level3">
<h3 class="anchored" data-anchor-id="outcome-reward-models">(Outcome) Reward Models</h3>