Built site for gh-pages
This commit is contained in:
@@ -178,7 +178,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../docs/cli.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">CLI Reference</span></a>
|
||||
<span class="menu-text">Command Line Interface (CLI)</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
@@ -186,6 +186,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<a href="../docs/config.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Config Reference</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../docs/api" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">API Reference</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
@@ -462,7 +468,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
|
||||
<section id="overview" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="overview">Overview</h3>
|
||||
<p>Reward modelling is a technique used to train models to predict the reward or value of a given input. This is particularly useful in reinforcement learning scenarios where the model needs to evaluate the quality of its actions or predictions. We support the reward modelling techniques supported by <code>trl</code>.</p>
|
||||
<p>Reward modelling is a technique used to train models to predict the reward or value of a given input. This is particularly useful in reinforcement learning scenarios where the model needs to evaluate the quality of its actions or predictions.
|
||||
We support the reward modelling techniques supported by <code>trl</code>.</p>
|
||||
</section>
|
||||
<section id="outcome-reward-models" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="outcome-reward-models">(Outcome) Reward Models</h3>
|
||||
|
||||
Reference in New Issue
Block a user