Built site for gh-pages

2025-03-21 17:30:33 +00:00
parent 486fc53c93
commit 127f9229b5
171 changed files with 127099 additions and 1001 deletions
--- a/docs/rlhf.html
+++ b/docs/rlhf.html
@@ -178,7 +178,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
          <li class="sidebar-item">
  <div class="sidebar-item-container"> 
  <a href="../docs/cli.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">CLI Reference</span></a>
+ <span class="menu-text">Command Line Interface (CLI)</span></a>
  </div>
 </li>
          <li class="sidebar-item">
@@ -186,6 +186,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
  <a href="../docs/config.html" class="sidebar-item-text sidebar-link">
 <span class="menu-text">Config Reference</span></a>
  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../docs/api" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">API Reference</span></a>
+  </div>
 </li>
      </ul>
  </li>
@@ -504,7 +510,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin

 <section id="overview" class="level2">
 <h2 class="anchored" data-anchor-id="overview">Overview</h2>
-<p>Reinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human feedback. Various methods include, but not limited to:</p>
+<p>Reinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human
+feedback. Various methods include, but not limited to:</p>
 <ul>
 <li><a href="#dpo">Direct Preference Optimization (DPO)</a></li>
 <li><a href="#ipo">Identity Preference Optimization (IPO)</a></li>