Built site for gh-pages
This commit is contained in:
@@ -168,6 +168,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<a href="../docs/installation.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Installation</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../docs/inference.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Inference and Merging</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
@@ -177,8 +183,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../docs/inference.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Inference</span></a>
|
||||
<a href="../docs/config.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Config Reference</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
@@ -408,23 +414,6 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<a href="../docs/nccl.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">NCCL</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-8" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Reference</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-8" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-8" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../docs/config.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Config options</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
@@ -501,6 +490,19 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
</section>
|
||||
<section id="process-reward-models-prm" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="process-reward-models-prm">Process Reward Models (PRM)</h3>
|
||||
<div class="callout callout-style-default callout-tip callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"></i>
|
||||
</div>
|
||||
<div class="callout-title-container flex-fill">
|
||||
Tip
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>Check out our <a href="https://axolotlai.substack.com/p/process-reward-models">PRM blog</a>.</p>
|
||||
</div>
|
||||
</div>
|
||||
<p>Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.</p>
|
||||
<div class="sourceCode" id="cb3"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2.5-3B</span></span>
|
||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">model_type</span><span class="kw">:</span><span class="at"> AutoModelForTokenClassification</span></span>
|
||||
|
||||
Reference in New Issue
Block a user