Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-03-21 17:30:33 +00:00
parent 486fc53c93
commit 127f9229b5
171 changed files with 127099 additions and 1001 deletions

View File

@@ -178,7 +178,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../docs/cli.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">CLI Reference</span></a>
<span class="menu-text">Command Line Interface (CLI)</span></a>
</div>
</li>
<li class="sidebar-item">
@@ -186,6 +186,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<a href="../docs/config.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Config Reference</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../docs/api" class="sidebar-item-text sidebar-link">
<span class="menu-text">API Reference</span></a>
</div>
</li>
</ul>
</li>
@@ -504,7 +510,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>Reinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human feedback. Various methods include, but not limited to:</p>
<p>Reinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human
feedback. Various methods include, but not limited to:</p>
<ul>
<li><a href="#dpo">Direct Preference Optimization (DPO)</a></li>
<li><a href="#ipo">Identity Preference Optimization (IPO)</a></li>