Built site for gh-pages
This commit is contained in:
@@ -793,7 +793,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="#axolotl.cli.merge_lora.do_merge_lora">do_merge_lora</a></td>
|
||||
<td>Calls <code>transformers</code>’ <code>merge_and_unload</code> on the model given in the <code>axolotl</code> config</td>
|
||||
<td>Merges LoRA adapters with base model using either memory-efficient or legacy approach.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
@@ -864,8 +864,7 @@ config values will be overwritten to allow the LoRA merge logic to work as expec
|
||||
<section id="axolotl.cli.merge_lora.do_merge_lora" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.cli.merge_lora.do_merge_lora">do_merge_lora</h3>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>cli.merge_lora.do_merge_lora(cfg)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>Calls <code>transformers</code>’ <code>merge_and_unload</code> on the model given in the <code>axolotl</code> config
|
||||
along with the LoRA adapters to combine them into a single base model.</p>
|
||||
<p>Merges LoRA adapters with base model using either memory-efficient or legacy approach.</p>
|
||||
<section id="parameters-1" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
|
||||
@@ -2191,211 +2191,214 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<span id="cb1-1407"><a href="#cb1-1407" aria-hidden="true" tabindex="-1"></a><span class="fu">loraplus_lr_embedding</span><span class="kw">:</span><span class="at"> float | None = 1e-06</span></span>
|
||||
<span id="cb1-1408"><a href="#cb1-1408" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1409"><a href="#cb1-1409" aria-hidden="true" tabindex="-1"></a><span class="fu">merge_lora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1410"><a href="#cb1-1410" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1411"><a href="#cb1-1411" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to use ReLoRA. Use with jagged_restart_*steps options.</span></span>
|
||||
<span id="cb1-1412"><a href="#cb1-1412" aria-hidden="true" tabindex="-1"></a><span class="fu">relora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1413"><a href="#cb1-1413" aria-hidden="true" tabindex="-1"></a><span class="co"># threshold for optimizer magnitude when pruning</span></span>
|
||||
<span id="cb1-1414"><a href="#cb1-1414" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_prune_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1415"><a href="#cb1-1415" aria-hidden="true" tabindex="-1"></a><span class="co"># True to perform lora weight merges on cpu during restarts, for modest gpu memory</span></span>
|
||||
<span id="cb1-1416"><a href="#cb1-1416" aria-hidden="true" tabindex="-1"></a><span class="co"># savings</span></span>
|
||||
<span id="cb1-1417"><a href="#cb1-1417" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_cpu_offload</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1418"><a href="#cb1-1418" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1419"><a href="#cb1-1419" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to reset for jagged restarts</span></span>
|
||||
<span id="cb1-1420"><a href="#cb1-1420" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1421"><a href="#cb1-1421" aria-hidden="true" tabindex="-1"></a><span class="co"># how many warmup steps to take after reset for jagged restarts</span></span>
|
||||
<span id="cb1-1422"><a href="#cb1-1422" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_warmup_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1423"><a href="#cb1-1423" aria-hidden="true" tabindex="-1"></a><span class="co"># how many anneal steps to take before reset for jagged restarts</span></span>
|
||||
<span id="cb1-1424"><a href="#cb1-1424" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_anneal_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1425"><a href="#cb1-1425" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1426"><a href="#cb1-1426" aria-hidden="true" tabindex="-1"></a><span class="co"># If greater than 1, backpropagation will be skipped and the gradients will be</span></span>
|
||||
<span id="cb1-1427"><a href="#cb1-1427" aria-hidden="true" tabindex="-1"></a><span class="co"># accumulated for the given number of steps.</span></span>
|
||||
<span id="cb1-1428"><a href="#cb1-1428" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_accumulation_steps</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1429"><a href="#cb1-1429" aria-hidden="true" tabindex="-1"></a><span class="co"># The number of samples to include in each batch. This is the number of samples sent to</span></span>
|
||||
<span id="cb1-1430"><a href="#cb1-1430" aria-hidden="true" tabindex="-1"></a><span class="co"># each GPU. Batch size per gpu = micro_batch_size * gradient_accumulation_steps</span></span>
|
||||
<span id="cb1-1431"><a href="#cb1-1431" aria-hidden="true" tabindex="-1"></a><span class="fu">micro_batch_size</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1432"><a href="#cb1-1432" aria-hidden="true" tabindex="-1"></a><span class="co"># Total batch size, we do not recommended setting this manually</span></span>
|
||||
<span id="cb1-1433"><a href="#cb1-1433" aria-hidden="true" tabindex="-1"></a><span class="fu">batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1434"><a href="#cb1-1434" aria-hidden="true" tabindex="-1"></a><span class="co"># per gpu micro batch size for evals, defaults to value of micro_batch_size</span></span>
|
||||
<span id="cb1-1435"><a href="#cb1-1435" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1436"><a href="#cb1-1436" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1437"><a href="#cb1-1437" aria-hidden="true" tabindex="-1"></a><span class="co"># whether to find batch size that fits in memory. Passed to underlying transformers</span></span>
|
||||
<span id="cb1-1438"><a href="#cb1-1438" aria-hidden="true" tabindex="-1"></a><span class="co"># Trainer</span></span>
|
||||
<span id="cb1-1439"><a href="#cb1-1439" aria-hidden="true" tabindex="-1"></a><span class="fu">auto_find_batch_size</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1440"><a href="#cb1-1440" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1441"><a href="#cb1-1441" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to mask out or include the human's prompt from the training labels</span></span>
|
||||
<span id="cb1-1442"><a href="#cb1-1442" aria-hidden="true" tabindex="-1"></a><span class="fu">train_on_inputs</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
|
||||
<span id="cb1-1443"><a href="#cb1-1443" aria-hidden="true" tabindex="-1"></a><span class="co"># Group similarly sized data to minimize padding. May be slower to start, as it must</span></span>
|
||||
<span id="cb1-1444"><a href="#cb1-1444" aria-hidden="true" tabindex="-1"></a><span class="co"># download and sort the entire dataset. Note that training loss may have an oscillating</span></span>
|
||||
<span id="cb1-1445"><a href="#cb1-1445" aria-hidden="true" tabindex="-1"></a><span class="co"># pattern with this enabled.</span></span>
|
||||
<span id="cb1-1446"><a href="#cb1-1446" aria-hidden="true" tabindex="-1"></a><span class="fu">group_by_length</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1447"><a href="#cb1-1447" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1448"><a href="#cb1-1448" aria-hidden="true" tabindex="-1"></a><span class="fu">learning_rate</span><span class="kw">:</span><span class="at"> str | float (required)</span></span>
|
||||
<span id="cb1-1449"><a href="#cb1-1449" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1450"><a href="#cb1-1450" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr_scale</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1451"><a href="#cb1-1451" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify weight decay</span></span>
|
||||
<span id="cb1-1452"><a href="#cb1-1452" aria-hidden="true" tabindex="-1"></a><span class="fu">weight_decay</span><span class="kw">:</span><span class="at"> float | None = 0.0</span></span>
|
||||
<span id="cb1-1453"><a href="#cb1-1453" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify optimizer</span></span>
|
||||
<span id="cb1-1454"><a href="#cb1-1454" aria-hidden="true" tabindex="-1"></a><span class="fu">optimizer</span><span class="kw">:</span><span class="at"> OptimizerNames | CustomSupportedOptimizers | None = OptimizerNames.ADAMW_TORCH_FUSED</span></span>
|
||||
<span id="cb1-1455"><a href="#cb1-1455" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary of arguments to pass to the optimizer</span></span>
|
||||
<span id="cb1-1456"><a href="#cb1-1456" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_args</span><span class="kw">:</span><span class="at"> str | dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1457"><a href="#cb1-1457" aria-hidden="true" tabindex="-1"></a><span class="co"># The target modules to optimize, i.e. the module names that you would like to train,</span></span>
|
||||
<span id="cb1-1458"><a href="#cb1-1458" aria-hidden="true" tabindex="-1"></a><span class="co"># right now this is used only for GaLore algorithm</span></span>
|
||||
<span id="cb1-1459"><a href="#cb1-1459" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_target_modules</span><span class="kw">:</span><span class="at"> list[str] | Literal['all_linear'] | None</span></span>
|
||||
<span id="cb1-1460"><a href="#cb1-1460" aria-hidden="true" tabindex="-1"></a><span class="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
|
||||
<span id="cb1-1461"><a href="#cb1-1461" aria-hidden="true" tabindex="-1"></a><span class="fu">torchdistx_path</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1462"><a href="#cb1-1462" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler</span><span class="kw">:</span><span class="at"> SchedulerType | Literal['one_cycle'] | Literal['rex'] | None = SchedulerType.COSINE</span></span>
|
||||
<span id="cb1-1463"><a href="#cb1-1463" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify a scheduler and kwargs to use with the optimizer</span></span>
|
||||
<span id="cb1-1464"><a href="#cb1-1464" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1465"><a href="#cb1-1465" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_quadratic_warmup</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1466"><a href="#cb1-1466" aria-hidden="true" tabindex="-1"></a><span class="co"># decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of</span></span>
|
||||
<span id="cb1-1467"><a href="#cb1-1467" aria-hidden="true" tabindex="-1"></a><span class="co"># peak lr</span></span>
|
||||
<span id="cb1-1468"><a href="#cb1-1468" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_min_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1469"><a href="#cb1-1469" aria-hidden="true" tabindex="-1"></a><span class="co"># freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means</span></span>
|
||||
<span id="cb1-1470"><a href="#cb1-1470" aria-hidden="true" tabindex="-1"></a><span class="co"># start cosine_min_lr at 80% of training step</span></span>
|
||||
<span id="cb1-1471"><a href="#cb1-1471" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_constant_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1472"><a href="#cb1-1472" aria-hidden="true" tabindex="-1"></a><span class="co"># Learning rate div factor</span></span>
|
||||
<span id="cb1-1473"><a href="#cb1-1473" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_div_factor</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1474"><a href="#cb1-1474" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1475"><a href="#cb1-1475" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_groups</span><span class="kw">:</span><span class="at"> list[LrGroup] | None</span></span>
|
||||
<span id="cb1-1476"><a href="#cb1-1476" aria-hidden="true" tabindex="-1"></a><span class="co"> # For LrGroup:</span></span>
|
||||
<span id="cb1-1477"><a href="#cb1-1477" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> str (required)</span></span>
|
||||
<span id="cb1-1478"><a href="#cb1-1478" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">modules</span><span class="kw">:</span><span class="at"> list[str] (required)</span></span>
|
||||
<span id="cb1-1479"><a href="#cb1-1479" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">lr</span><span class="kw">:</span><span class="at"> float (required)</span></span>
|
||||
<span id="cb1-1480"><a href="#cb1-1480" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1481"><a href="#cb1-1481" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1482"><a href="#cb1-1482" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1483"><a href="#cb1-1483" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
|
||||
<span id="cb1-1484"><a href="#cb1-1484" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon2</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1485"><a href="#cb1-1485" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1486"><a href="#cb1-1486" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta1</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1487"><a href="#cb1-1487" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1488"><a href="#cb1-1488" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta2</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1489"><a href="#cb1-1489" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
|
||||
<span id="cb1-1490"><a href="#cb1-1490" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta3</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1491"><a href="#cb1-1491" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1492"><a href="#cb1-1492" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer learning rate</span></span>
|
||||
<span id="cb1-1493"><a href="#cb1-1493" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1494"><a href="#cb1-1494" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer momentum</span></span>
|
||||
<span id="cb1-1495"><a href="#cb1-1495" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_momentum</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1496"><a href="#cb1-1496" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: r/d fraction for low-rank approximation. Used to compute the low-rank</span></span>
|
||||
<span id="cb1-1497"><a href="#cb1-1497" aria-hidden="true" tabindex="-1"></a><span class="co"># dimension.</span></span>
|
||||
<span id="cb1-1498"><a href="#cb1-1498" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_fraction</span><span class="kw">:</span><span class="at"> float | None = 1.0</span></span>
|
||||
<span id="cb1-1499"><a href="#cb1-1499" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: Round up the low-rank dimension to a multiple of this number. This may</span></span>
|
||||
<span id="cb1-1500"><a href="#cb1-1500" aria-hidden="true" tabindex="-1"></a><span class="co"># be useful to ensure even sharding.</span></span>
|
||||
<span id="cb1-1501"><a href="#cb1-1501" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_multiple_of</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1502"><a href="#cb1-1502" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1503"><a href="#cb1-1503" aria-hidden="true" tabindex="-1"></a><span class="co"># Gradient clipping max norm</span></span>
|
||||
<span id="cb1-1504"><a href="#cb1-1504" aria-hidden="true" tabindex="-1"></a><span class="fu">max_grad_norm</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1505"><a href="#cb1-1505" aria-hidden="true" tabindex="-1"></a><span class="fu">num_epochs</span><span class="kw">:</span><span class="at"> float = 1.0</span></span>
|
||||
<span id="cb1-1506"><a href="#cb1-1506" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1507"><a href="#cb1-1507" aria-hidden="true" tabindex="-1"></a><span class="fu">use_wandb</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1508"><a href="#cb1-1508" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your wandb run</span></span>
|
||||
<span id="cb1-1509"><a href="#cb1-1509" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1510"><a href="#cb1-1510" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the ID of your wandb run</span></span>
|
||||
<span id="cb1-1511"><a href="#cb1-1511" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_run_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1512"><a href="#cb1-1512" aria-hidden="true" tabindex="-1"></a><span class="co"># "offline" to save run metadata locally and not sync to the server, "disabled" to turn</span></span>
|
||||
<span id="cb1-1513"><a href="#cb1-1513" aria-hidden="true" tabindex="-1"></a><span class="co"># off wandb</span></span>
|
||||
<span id="cb1-1514"><a href="#cb1-1514" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1515"><a href="#cb1-1515" aria-hidden="true" tabindex="-1"></a><span class="co"># Your wandb project name</span></span>
|
||||
<span id="cb1-1516"><a href="#cb1-1516" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_project</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1517"><a href="#cb1-1517" aria-hidden="true" tabindex="-1"></a><span class="co"># A wandb Team name if using a Team</span></span>
|
||||
<span id="cb1-1518"><a href="#cb1-1518" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_entity</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1519"><a href="#cb1-1519" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_watch</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1520"><a href="#cb1-1520" aria-hidden="true" tabindex="-1"></a><span class="co"># "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only</span></span>
|
||||
<span id="cb1-1521"><a href="#cb1-1521" aria-hidden="true" tabindex="-1"></a><span class="co"># at the end of training</span></span>
|
||||
<span id="cb1-1522"><a href="#cb1-1522" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_log_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1523"><a href="#cb1-1523" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1524"><a href="#cb1-1524" aria-hidden="true" tabindex="-1"></a><span class="fu">use_mlflow</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1525"><a href="#cb1-1525" aria-hidden="true" tabindex="-1"></a><span class="co"># URI to mlflow</span></span>
|
||||
<span id="cb1-1526"><a href="#cb1-1526" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_tracking_uri</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1527"><a href="#cb1-1527" aria-hidden="true" tabindex="-1"></a><span class="co"># Your experiment name</span></span>
|
||||
<span id="cb1-1528"><a href="#cb1-1528" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_experiment_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1529"><a href="#cb1-1529" aria-hidden="true" tabindex="-1"></a><span class="co"># Your run name</span></span>
|
||||
<span id="cb1-1530"><a href="#cb1-1530" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1531"><a href="#cb1-1531" aria-hidden="true" tabindex="-1"></a><span class="co"># set to true to copy each saved checkpoint on each save to mlflow artifact registry</span></span>
|
||||
<span id="cb1-1532"><a href="#cb1-1532" aria-hidden="true" tabindex="-1"></a><span class="fu">hf_mlflow_log_artifacts</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1533"><a href="#cb1-1533" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1534"><a href="#cb1-1534" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable or disable Comet integration.</span></span>
|
||||
<span id="cb1-1535"><a href="#cb1-1535" aria-hidden="true" tabindex="-1"></a><span class="fu">use_comet</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1536"><a href="#cb1-1536" aria-hidden="true" tabindex="-1"></a><span class="co"># API key for Comet. Recommended to set via `comet login`.</span></span>
|
||||
<span id="cb1-1537"><a href="#cb1-1537" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_api_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1538"><a href="#cb1-1538" aria-hidden="true" tabindex="-1"></a><span class="co"># Workspace name in Comet. Defaults to the user's default workspace.</span></span>
|
||||
<span id="cb1-1539"><a href="#cb1-1539" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_workspace</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1540"><a href="#cb1-1540" aria-hidden="true" tabindex="-1"></a><span class="co"># Project name in Comet. Defaults to Uncategorized.</span></span>
|
||||
<span id="cb1-1541"><a href="#cb1-1541" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1542"><a href="#cb1-1542" aria-hidden="true" tabindex="-1"></a><span class="co"># Identifier for the experiment. Used to append data to an existing experiment or</span></span>
|
||||
<span id="cb1-1543"><a href="#cb1-1543" aria-hidden="true" tabindex="-1"></a><span class="co"># control the key of new experiments. Default to a random key.</span></span>
|
||||
<span id="cb1-1544"><a href="#cb1-1544" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1545"><a href="#cb1-1545" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new experiment ("create") or log to an existing one ("get"). Default</span></span>
|
||||
<span id="cb1-1546"><a href="#cb1-1546" aria-hidden="true" tabindex="-1"></a><span class="co"># ("get_or_create") auto-selects based on configuration.</span></span>
|
||||
<span id="cb1-1547"><a href="#cb1-1547" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1548"><a href="#cb1-1548" aria-hidden="true" tabindex="-1"></a><span class="co"># Set to True to log data to Comet server, or False for offline storage. Default is</span></span>
|
||||
<span id="cb1-1549"><a href="#cb1-1549" aria-hidden="true" tabindex="-1"></a><span class="co"># True.</span></span>
|
||||
<span id="cb1-1550"><a href="#cb1-1550" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_online</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1551"><a href="#cb1-1551" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary for additional configuration settings, see the doc for more details.</span></span>
|
||||
<span id="cb1-1552"><a href="#cb1-1552" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1553"><a href="#cb1-1553" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1554"><a href="#cb1-1554" aria-hidden="true" tabindex="-1"></a><span class="fu">use_trackio</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1555"><a href="#cb1-1555" aria-hidden="true" tabindex="-1"></a><span class="co"># Your trackio project name</span></span>
|
||||
<span id="cb1-1556"><a href="#cb1-1556" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1557"><a href="#cb1-1557" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your trackio run</span></span>
|
||||
<span id="cb1-1558"><a href="#cb1-1558" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1559"><a href="#cb1-1559" aria-hidden="true" tabindex="-1"></a><span class="co"># Hugging Face Space ID to sync dashboard to (optional, runs locally if not provided)</span></span>
|
||||
<span id="cb1-1560"><a href="#cb1-1560" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_space_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1561"><a href="#cb1-1561" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1562"><a href="#cb1-1562" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable OpenTelemetry metrics collection and Prometheus export</span></span>
|
||||
<span id="cb1-1563"><a href="#cb1-1563" aria-hidden="true" tabindex="-1"></a><span class="fu">use_otel_metrics</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
|
||||
<span id="cb1-1564"><a href="#cb1-1564" aria-hidden="true" tabindex="-1"></a><span class="co"># Host to bind the OpenTelemetry metrics server to</span></span>
|
||||
<span id="cb1-1565"><a href="#cb1-1565" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_host</span><span class="kw">:</span><span class="at"> str | None = localhost</span></span>
|
||||
<span id="cb1-1566"><a href="#cb1-1566" aria-hidden="true" tabindex="-1"></a><span class="co"># Port for the Prometheus metrics HTTP server</span></span>
|
||||
<span id="cb1-1567"><a href="#cb1-1567" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_port</span><span class="kw">:</span><span class="at"> int | None = 8000</span></span>
|
||||
<span id="cb1-1568"><a href="#cb1-1568" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1569"><a href="#cb1-1569" aria-hidden="true" tabindex="-1"></a><span class="co"># the number of activate layers in LISA</span></span>
|
||||
<span id="cb1-1570"><a href="#cb1-1570" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_n_layers</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1571"><a href="#cb1-1571" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to switch layers in LISA</span></span>
|
||||
<span id="cb1-1572"><a href="#cb1-1572" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_step_interval</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1573"><a href="#cb1-1573" aria-hidden="true" tabindex="-1"></a><span class="co"># path under the model to access the layers</span></span>
|
||||
<span id="cb1-1574"><a href="#cb1-1574" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_layers_attribute</span><span class="kw">:</span><span class="at"> str | None = model.layers</span></span>
|
||||
<span id="cb1-1575"><a href="#cb1-1575" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1576"><a href="#cb1-1576" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_title</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1577"><a href="#cb1-1577" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_share</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1578"><a href="#cb1-1578" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1579"><a href="#cb1-1579" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_port</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1580"><a href="#cb1-1580" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1581"><a href="#cb1-1581" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_temperature</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1582"><a href="#cb1-1582" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1583"><a href="#cb1-1583" aria-hidden="true" tabindex="-1"></a><span class="fu">use_ray</span><span class="kw">:</span><span class="at"> bool = False</span></span>
|
||||
<span id="cb1-1584"><a href="#cb1-1584" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1585"><a href="#cb1-1585" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_num_workers</span><span class="kw">:</span><span class="at"> int = 1</span></span>
|
||||
<span id="cb1-1586"><a href="#cb1-1586" aria-hidden="true" tabindex="-1"></a><span class="fu">resources_per_worker</span><span class="kw">:</span><span class="at"> dict</span></span>
|
||||
<span id="cb1-1587"><a href="#cb1-1587" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1588"><a href="#cb1-1588" aria-hidden="true" tabindex="-1"></a><span class="co"># The size of the image to resize to. It can be an integer (resized into padded-square</span></span>
|
||||
<span id="cb1-1589"><a href="#cb1-1589" aria-hidden="true" tabindex="-1"></a><span class="co"># image) or a tuple (width, height).If not provided, we will attempt to load from</span></span>
|
||||
<span id="cb1-1590"><a href="#cb1-1590" aria-hidden="true" tabindex="-1"></a><span class="co"># preprocessor.size, otherwise, images won't be resized.</span></span>
|
||||
<span id="cb1-1591"><a href="#cb1-1591" aria-hidden="true" tabindex="-1"></a><span class="fu">image_size</span><span class="kw">:</span><span class="at"> int | tuple[int, int] | None</span></span>
|
||||
<span id="cb1-1592"><a href="#cb1-1592" aria-hidden="true" tabindex="-1"></a><span class="co"># The resampling algorithm to use for image resizing. Default is bilinear. Please refer</span></span>
|
||||
<span id="cb1-1593"><a href="#cb1-1593" aria-hidden="true" tabindex="-1"></a><span class="co"># to PIL.Image.Resampling for more details.</span></span>
|
||||
<span id="cb1-1594"><a href="#cb1-1594" aria-hidden="true" tabindex="-1"></a><span class="fu">image_resize_algorithm</span><span class="kw">:</span><span class="at"> Literal['bilinear', 'bicubic', 'lanczos'] | Resampling | None</span></span>
|
||||
<span id="cb1-1595"><a href="#cb1-1595" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1596"><a href="#cb1-1596" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides to the base model configuration</span></span>
|
||||
<span id="cb1-1597"><a href="#cb1-1597" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1598"><a href="#cb1-1598" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides the base model loading from_pretrained</span></span>
|
||||
<span id="cb1-1599"><a href="#cb1-1599" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1600"><a href="#cb1-1600" aria-hidden="true" tabindex="-1"></a><span class="co"># If you want to specify the type of model to load, AutoModelForCausalLM is a good</span></span>
|
||||
<span id="cb1-1601"><a href="#cb1-1601" aria-hidden="true" tabindex="-1"></a><span class="co"># choice too</span></span>
|
||||
<span id="cb1-1602"><a href="#cb1-1602" aria-hidden="true" tabindex="-1"></a><span class="fu">type_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1603"><a href="#cb1-1603" aria-hidden="true" tabindex="-1"></a><span class="co"># You can specify to choose a specific model revision from huggingface hub</span></span>
|
||||
<span id="cb1-1604"><a href="#cb1-1604" aria-hidden="true" tabindex="-1"></a><span class="fu">revision_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1605"><a href="#cb1-1605" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1606"><a href="#cb1-1606" aria-hidden="true" tabindex="-1"></a><span class="fu">max_packed_sequence_len</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1607"><a href="#cb1-1607" aria-hidden="true" tabindex="-1"></a><span class="fu">rope_scaling</span><span class="kw">:</span><span class="at"> Any | None</span></span>
|
||||
<span id="cb1-1608"><a href="#cb1-1608" aria-hidden="true" tabindex="-1"></a><span class="fu">noisy_embedding_alpha</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1609"><a href="#cb1-1609" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_beta</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1610"><a href="#cb1-1610" aria-hidden="true" tabindex="-1"></a><span class="fu">evaluation_strategy</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1611"><a href="#cb1-1611" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_table_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1612"><a href="#cb1-1612" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1613"><a href="#cb1-1613" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_use_logits_to_keep</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1614"><a href="#cb1-1614" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_generate_during_eval</span><span class="kw">:</span><span class="at"> bool | None</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<span id="cb1-1410"><a href="#cb1-1410" aria-hidden="true" tabindex="-1"></a><span class="co"># Method to use for LoRA merging. 'memory_efficient' (default) processes shards</span></span>
|
||||
<span id="cb1-1411"><a href="#cb1-1411" aria-hidden="true" tabindex="-1"></a><span class="co"># individually to reduce memory usage, 'legacy' loads the full model into memory.</span></span>
|
||||
<span id="cb1-1412"><a href="#cb1-1412" aria-hidden="true" tabindex="-1"></a><span class="fu">merge_method</span><span class="kw">:</span><span class="at"> Literal['legacy', 'memory_efficient'] | None = memory_efficient</span></span>
|
||||
<span id="cb1-1413"><a href="#cb1-1413" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1414"><a href="#cb1-1414" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to use ReLoRA. Use with jagged_restart_*steps options.</span></span>
|
||||
<span id="cb1-1415"><a href="#cb1-1415" aria-hidden="true" tabindex="-1"></a><span class="fu">relora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1416"><a href="#cb1-1416" aria-hidden="true" tabindex="-1"></a><span class="co"># threshold for optimizer magnitude when pruning</span></span>
|
||||
<span id="cb1-1417"><a href="#cb1-1417" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_prune_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1418"><a href="#cb1-1418" aria-hidden="true" tabindex="-1"></a><span class="co"># True to perform lora weight merges on cpu during restarts, for modest gpu memory</span></span>
|
||||
<span id="cb1-1419"><a href="#cb1-1419" aria-hidden="true" tabindex="-1"></a><span class="co"># savings</span></span>
|
||||
<span id="cb1-1420"><a href="#cb1-1420" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_cpu_offload</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1421"><a href="#cb1-1421" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1422"><a href="#cb1-1422" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to reset for jagged restarts</span></span>
|
||||
<span id="cb1-1423"><a href="#cb1-1423" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1424"><a href="#cb1-1424" aria-hidden="true" tabindex="-1"></a><span class="co"># how many warmup steps to take after reset for jagged restarts</span></span>
|
||||
<span id="cb1-1425"><a href="#cb1-1425" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_warmup_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1426"><a href="#cb1-1426" aria-hidden="true" tabindex="-1"></a><span class="co"># how many anneal steps to take before reset for jagged restarts</span></span>
|
||||
<span id="cb1-1427"><a href="#cb1-1427" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_anneal_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1428"><a href="#cb1-1428" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1429"><a href="#cb1-1429" aria-hidden="true" tabindex="-1"></a><span class="co"># If greater than 1, backpropagation will be skipped and the gradients will be</span></span>
|
||||
<span id="cb1-1430"><a href="#cb1-1430" aria-hidden="true" tabindex="-1"></a><span class="co"># accumulated for the given number of steps.</span></span>
|
||||
<span id="cb1-1431"><a href="#cb1-1431" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_accumulation_steps</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1432"><a href="#cb1-1432" aria-hidden="true" tabindex="-1"></a><span class="co"># The number of samples to include in each batch. This is the number of samples sent to</span></span>
|
||||
<span id="cb1-1433"><a href="#cb1-1433" aria-hidden="true" tabindex="-1"></a><span class="co"># each GPU. Batch size per gpu = micro_batch_size * gradient_accumulation_steps</span></span>
|
||||
<span id="cb1-1434"><a href="#cb1-1434" aria-hidden="true" tabindex="-1"></a><span class="fu">micro_batch_size</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1435"><a href="#cb1-1435" aria-hidden="true" tabindex="-1"></a><span class="co"># Total batch size, we do not recommended setting this manually</span></span>
|
||||
<span id="cb1-1436"><a href="#cb1-1436" aria-hidden="true" tabindex="-1"></a><span class="fu">batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1437"><a href="#cb1-1437" aria-hidden="true" tabindex="-1"></a><span class="co"># per gpu micro batch size for evals, defaults to value of micro_batch_size</span></span>
|
||||
<span id="cb1-1438"><a href="#cb1-1438" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1439"><a href="#cb1-1439" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1440"><a href="#cb1-1440" aria-hidden="true" tabindex="-1"></a><span class="co"># whether to find batch size that fits in memory. Passed to underlying transformers</span></span>
|
||||
<span id="cb1-1441"><a href="#cb1-1441" aria-hidden="true" tabindex="-1"></a><span class="co"># Trainer</span></span>
|
||||
<span id="cb1-1442"><a href="#cb1-1442" aria-hidden="true" tabindex="-1"></a><span class="fu">auto_find_batch_size</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1443"><a href="#cb1-1443" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1444"><a href="#cb1-1444" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to mask out or include the human's prompt from the training labels</span></span>
|
||||
<span id="cb1-1445"><a href="#cb1-1445" aria-hidden="true" tabindex="-1"></a><span class="fu">train_on_inputs</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
|
||||
<span id="cb1-1446"><a href="#cb1-1446" aria-hidden="true" tabindex="-1"></a><span class="co"># Group similarly sized data to minimize padding. May be slower to start, as it must</span></span>
|
||||
<span id="cb1-1447"><a href="#cb1-1447" aria-hidden="true" tabindex="-1"></a><span class="co"># download and sort the entire dataset. Note that training loss may have an oscillating</span></span>
|
||||
<span id="cb1-1448"><a href="#cb1-1448" aria-hidden="true" tabindex="-1"></a><span class="co"># pattern with this enabled.</span></span>
|
||||
<span id="cb1-1449"><a href="#cb1-1449" aria-hidden="true" tabindex="-1"></a><span class="fu">group_by_length</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1450"><a href="#cb1-1450" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1451"><a href="#cb1-1451" aria-hidden="true" tabindex="-1"></a><span class="fu">learning_rate</span><span class="kw">:</span><span class="at"> str | float (required)</span></span>
|
||||
<span id="cb1-1452"><a href="#cb1-1452" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1453"><a href="#cb1-1453" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr_scale</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1454"><a href="#cb1-1454" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify weight decay</span></span>
|
||||
<span id="cb1-1455"><a href="#cb1-1455" aria-hidden="true" tabindex="-1"></a><span class="fu">weight_decay</span><span class="kw">:</span><span class="at"> float | None = 0.0</span></span>
|
||||
<span id="cb1-1456"><a href="#cb1-1456" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify optimizer</span></span>
|
||||
<span id="cb1-1457"><a href="#cb1-1457" aria-hidden="true" tabindex="-1"></a><span class="fu">optimizer</span><span class="kw">:</span><span class="at"> OptimizerNames | CustomSupportedOptimizers | None = OptimizerNames.ADAMW_TORCH_FUSED</span></span>
|
||||
<span id="cb1-1458"><a href="#cb1-1458" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary of arguments to pass to the optimizer</span></span>
|
||||
<span id="cb1-1459"><a href="#cb1-1459" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_args</span><span class="kw">:</span><span class="at"> str | dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1460"><a href="#cb1-1460" aria-hidden="true" tabindex="-1"></a><span class="co"># The target modules to optimize, i.e. the module names that you would like to train,</span></span>
|
||||
<span id="cb1-1461"><a href="#cb1-1461" aria-hidden="true" tabindex="-1"></a><span class="co"># right now this is used only for GaLore algorithm</span></span>
|
||||
<span id="cb1-1462"><a href="#cb1-1462" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_target_modules</span><span class="kw">:</span><span class="at"> list[str] | Literal['all_linear'] | None</span></span>
|
||||
<span id="cb1-1463"><a href="#cb1-1463" aria-hidden="true" tabindex="-1"></a><span class="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
|
||||
<span id="cb1-1464"><a href="#cb1-1464" aria-hidden="true" tabindex="-1"></a><span class="fu">torchdistx_path</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1465"><a href="#cb1-1465" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler</span><span class="kw">:</span><span class="at"> SchedulerType | Literal['one_cycle'] | Literal['rex'] | None = SchedulerType.COSINE</span></span>
|
||||
<span id="cb1-1466"><a href="#cb1-1466" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify a scheduler and kwargs to use with the optimizer</span></span>
|
||||
<span id="cb1-1467"><a href="#cb1-1467" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1468"><a href="#cb1-1468" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_quadratic_warmup</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1469"><a href="#cb1-1469" aria-hidden="true" tabindex="-1"></a><span class="co"># decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of</span></span>
|
||||
<span id="cb1-1470"><a href="#cb1-1470" aria-hidden="true" tabindex="-1"></a><span class="co"># peak lr</span></span>
|
||||
<span id="cb1-1471"><a href="#cb1-1471" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_min_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1472"><a href="#cb1-1472" aria-hidden="true" tabindex="-1"></a><span class="co"># freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means</span></span>
|
||||
<span id="cb1-1473"><a href="#cb1-1473" aria-hidden="true" tabindex="-1"></a><span class="co"># start cosine_min_lr at 80% of training step</span></span>
|
||||
<span id="cb1-1474"><a href="#cb1-1474" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_constant_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1475"><a href="#cb1-1475" aria-hidden="true" tabindex="-1"></a><span class="co"># Learning rate div factor</span></span>
|
||||
<span id="cb1-1476"><a href="#cb1-1476" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_div_factor</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1477"><a href="#cb1-1477" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1478"><a href="#cb1-1478" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_groups</span><span class="kw">:</span><span class="at"> list[LrGroup] | None</span></span>
|
||||
<span id="cb1-1479"><a href="#cb1-1479" aria-hidden="true" tabindex="-1"></a><span class="co"> # For LrGroup:</span></span>
|
||||
<span id="cb1-1480"><a href="#cb1-1480" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> str (required)</span></span>
|
||||
<span id="cb1-1481"><a href="#cb1-1481" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">modules</span><span class="kw">:</span><span class="at"> list[str] (required)</span></span>
|
||||
<span id="cb1-1482"><a href="#cb1-1482" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">lr</span><span class="kw">:</span><span class="at"> float (required)</span></span>
|
||||
<span id="cb1-1483"><a href="#cb1-1483" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1484"><a href="#cb1-1484" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1485"><a href="#cb1-1485" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1486"><a href="#cb1-1486" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
|
||||
<span id="cb1-1487"><a href="#cb1-1487" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon2</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1488"><a href="#cb1-1488" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1489"><a href="#cb1-1489" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta1</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1490"><a href="#cb1-1490" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
|
||||
<span id="cb1-1491"><a href="#cb1-1491" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta2</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1492"><a href="#cb1-1492" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
|
||||
<span id="cb1-1493"><a href="#cb1-1493" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta3</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1494"><a href="#cb1-1494" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1495"><a href="#cb1-1495" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer learning rate</span></span>
|
||||
<span id="cb1-1496"><a href="#cb1-1496" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1497"><a href="#cb1-1497" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer momentum</span></span>
|
||||
<span id="cb1-1498"><a href="#cb1-1498" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_momentum</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1499"><a href="#cb1-1499" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: r/d fraction for low-rank approximation. Used to compute the low-rank</span></span>
|
||||
<span id="cb1-1500"><a href="#cb1-1500" aria-hidden="true" tabindex="-1"></a><span class="co"># dimension.</span></span>
|
||||
<span id="cb1-1501"><a href="#cb1-1501" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_fraction</span><span class="kw">:</span><span class="at"> float | None = 1.0</span></span>
|
||||
<span id="cb1-1502"><a href="#cb1-1502" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: Round up the low-rank dimension to a multiple of this number. This may</span></span>
|
||||
<span id="cb1-1503"><a href="#cb1-1503" aria-hidden="true" tabindex="-1"></a><span class="co"># be useful to ensure even sharding.</span></span>
|
||||
<span id="cb1-1504"><a href="#cb1-1504" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_multiple_of</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
|
||||
<span id="cb1-1505"><a href="#cb1-1505" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1506"><a href="#cb1-1506" aria-hidden="true" tabindex="-1"></a><span class="co"># Gradient clipping max norm</span></span>
|
||||
<span id="cb1-1507"><a href="#cb1-1507" aria-hidden="true" tabindex="-1"></a><span class="fu">max_grad_norm</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1508"><a href="#cb1-1508" aria-hidden="true" tabindex="-1"></a><span class="fu">num_epochs</span><span class="kw">:</span><span class="at"> float = 1.0</span></span>
|
||||
<span id="cb1-1509"><a href="#cb1-1509" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1510"><a href="#cb1-1510" aria-hidden="true" tabindex="-1"></a><span class="fu">use_wandb</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1511"><a href="#cb1-1511" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your wandb run</span></span>
|
||||
<span id="cb1-1512"><a href="#cb1-1512" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1513"><a href="#cb1-1513" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the ID of your wandb run</span></span>
|
||||
<span id="cb1-1514"><a href="#cb1-1514" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_run_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1515"><a href="#cb1-1515" aria-hidden="true" tabindex="-1"></a><span class="co"># "offline" to save run metadata locally and not sync to the server, "disabled" to turn</span></span>
|
||||
<span id="cb1-1516"><a href="#cb1-1516" aria-hidden="true" tabindex="-1"></a><span class="co"># off wandb</span></span>
|
||||
<span id="cb1-1517"><a href="#cb1-1517" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1518"><a href="#cb1-1518" aria-hidden="true" tabindex="-1"></a><span class="co"># Your wandb project name</span></span>
|
||||
<span id="cb1-1519"><a href="#cb1-1519" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_project</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1520"><a href="#cb1-1520" aria-hidden="true" tabindex="-1"></a><span class="co"># A wandb Team name if using a Team</span></span>
|
||||
<span id="cb1-1521"><a href="#cb1-1521" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_entity</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1522"><a href="#cb1-1522" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_watch</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1523"><a href="#cb1-1523" aria-hidden="true" tabindex="-1"></a><span class="co"># "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only</span></span>
|
||||
<span id="cb1-1524"><a href="#cb1-1524" aria-hidden="true" tabindex="-1"></a><span class="co"># at the end of training</span></span>
|
||||
<span id="cb1-1525"><a href="#cb1-1525" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_log_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1526"><a href="#cb1-1526" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1527"><a href="#cb1-1527" aria-hidden="true" tabindex="-1"></a><span class="fu">use_mlflow</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1528"><a href="#cb1-1528" aria-hidden="true" tabindex="-1"></a><span class="co"># URI to mlflow</span></span>
|
||||
<span id="cb1-1529"><a href="#cb1-1529" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_tracking_uri</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1530"><a href="#cb1-1530" aria-hidden="true" tabindex="-1"></a><span class="co"># Your experiment name</span></span>
|
||||
<span id="cb1-1531"><a href="#cb1-1531" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_experiment_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1532"><a href="#cb1-1532" aria-hidden="true" tabindex="-1"></a><span class="co"># Your run name</span></span>
|
||||
<span id="cb1-1533"><a href="#cb1-1533" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1534"><a href="#cb1-1534" aria-hidden="true" tabindex="-1"></a><span class="co"># set to true to copy each saved checkpoint on each save to mlflow artifact registry</span></span>
|
||||
<span id="cb1-1535"><a href="#cb1-1535" aria-hidden="true" tabindex="-1"></a><span class="fu">hf_mlflow_log_artifacts</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1536"><a href="#cb1-1536" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1537"><a href="#cb1-1537" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable or disable Comet integration.</span></span>
|
||||
<span id="cb1-1538"><a href="#cb1-1538" aria-hidden="true" tabindex="-1"></a><span class="fu">use_comet</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1539"><a href="#cb1-1539" aria-hidden="true" tabindex="-1"></a><span class="co"># API key for Comet. Recommended to set via `comet login`.</span></span>
|
||||
<span id="cb1-1540"><a href="#cb1-1540" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_api_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1541"><a href="#cb1-1541" aria-hidden="true" tabindex="-1"></a><span class="co"># Workspace name in Comet. Defaults to the user's default workspace.</span></span>
|
||||
<span id="cb1-1542"><a href="#cb1-1542" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_workspace</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1543"><a href="#cb1-1543" aria-hidden="true" tabindex="-1"></a><span class="co"># Project name in Comet. Defaults to Uncategorized.</span></span>
|
||||
<span id="cb1-1544"><a href="#cb1-1544" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1545"><a href="#cb1-1545" aria-hidden="true" tabindex="-1"></a><span class="co"># Identifier for the experiment. Used to append data to an existing experiment or</span></span>
|
||||
<span id="cb1-1546"><a href="#cb1-1546" aria-hidden="true" tabindex="-1"></a><span class="co"># control the key of new experiments. Default to a random key.</span></span>
|
||||
<span id="cb1-1547"><a href="#cb1-1547" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1548"><a href="#cb1-1548" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new experiment ("create") or log to an existing one ("get"). Default</span></span>
|
||||
<span id="cb1-1549"><a href="#cb1-1549" aria-hidden="true" tabindex="-1"></a><span class="co"># ("get_or_create") auto-selects based on configuration.</span></span>
|
||||
<span id="cb1-1550"><a href="#cb1-1550" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1551"><a href="#cb1-1551" aria-hidden="true" tabindex="-1"></a><span class="co"># Set to True to log data to Comet server, or False for offline storage. Default is</span></span>
|
||||
<span id="cb1-1552"><a href="#cb1-1552" aria-hidden="true" tabindex="-1"></a><span class="co"># True.</span></span>
|
||||
<span id="cb1-1553"><a href="#cb1-1553" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_online</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1554"><a href="#cb1-1554" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary for additional configuration settings, see the doc for more details.</span></span>
|
||||
<span id="cb1-1555"><a href="#cb1-1555" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1556"><a href="#cb1-1556" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1557"><a href="#cb1-1557" aria-hidden="true" tabindex="-1"></a><span class="fu">use_trackio</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1558"><a href="#cb1-1558" aria-hidden="true" tabindex="-1"></a><span class="co"># Your trackio project name</span></span>
|
||||
<span id="cb1-1559"><a href="#cb1-1559" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1560"><a href="#cb1-1560" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your trackio run</span></span>
|
||||
<span id="cb1-1561"><a href="#cb1-1561" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1562"><a href="#cb1-1562" aria-hidden="true" tabindex="-1"></a><span class="co"># Hugging Face Space ID to sync dashboard to (optional, runs locally if not provided)</span></span>
|
||||
<span id="cb1-1563"><a href="#cb1-1563" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_space_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1564"><a href="#cb1-1564" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1565"><a href="#cb1-1565" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable OpenTelemetry metrics collection and Prometheus export</span></span>
|
||||
<span id="cb1-1566"><a href="#cb1-1566" aria-hidden="true" tabindex="-1"></a><span class="fu">use_otel_metrics</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
|
||||
<span id="cb1-1567"><a href="#cb1-1567" aria-hidden="true" tabindex="-1"></a><span class="co"># Host to bind the OpenTelemetry metrics server to</span></span>
|
||||
<span id="cb1-1568"><a href="#cb1-1568" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_host</span><span class="kw">:</span><span class="at"> str | None = localhost</span></span>
|
||||
<span id="cb1-1569"><a href="#cb1-1569" aria-hidden="true" tabindex="-1"></a><span class="co"># Port for the Prometheus metrics HTTP server</span></span>
|
||||
<span id="cb1-1570"><a href="#cb1-1570" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_port</span><span class="kw">:</span><span class="at"> int | None = 8000</span></span>
|
||||
<span id="cb1-1571"><a href="#cb1-1571" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1572"><a href="#cb1-1572" aria-hidden="true" tabindex="-1"></a><span class="co"># the number of activate layers in LISA</span></span>
|
||||
<span id="cb1-1573"><a href="#cb1-1573" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_n_layers</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1574"><a href="#cb1-1574" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to switch layers in LISA</span></span>
|
||||
<span id="cb1-1575"><a href="#cb1-1575" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_step_interval</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1576"><a href="#cb1-1576" aria-hidden="true" tabindex="-1"></a><span class="co"># path under the model to access the layers</span></span>
|
||||
<span id="cb1-1577"><a href="#cb1-1577" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_layers_attribute</span><span class="kw">:</span><span class="at"> str | None = model.layers</span></span>
|
||||
<span id="cb1-1578"><a href="#cb1-1578" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1579"><a href="#cb1-1579" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_title</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1580"><a href="#cb1-1580" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_share</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1581"><a href="#cb1-1581" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1582"><a href="#cb1-1582" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_port</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1583"><a href="#cb1-1583" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1584"><a href="#cb1-1584" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_temperature</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1585"><a href="#cb1-1585" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1586"><a href="#cb1-1586" aria-hidden="true" tabindex="-1"></a><span class="fu">use_ray</span><span class="kw">:</span><span class="at"> bool = False</span></span>
|
||||
<span id="cb1-1587"><a href="#cb1-1587" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1588"><a href="#cb1-1588" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_num_workers</span><span class="kw">:</span><span class="at"> int = 1</span></span>
|
||||
<span id="cb1-1589"><a href="#cb1-1589" aria-hidden="true" tabindex="-1"></a><span class="fu">resources_per_worker</span><span class="kw">:</span><span class="at"> dict</span></span>
|
||||
<span id="cb1-1590"><a href="#cb1-1590" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1591"><a href="#cb1-1591" aria-hidden="true" tabindex="-1"></a><span class="co"># The size of the image to resize to. It can be an integer (resized into padded-square</span></span>
|
||||
<span id="cb1-1592"><a href="#cb1-1592" aria-hidden="true" tabindex="-1"></a><span class="co"># image) or a tuple (width, height).If not provided, we will attempt to load from</span></span>
|
||||
<span id="cb1-1593"><a href="#cb1-1593" aria-hidden="true" tabindex="-1"></a><span class="co"># preprocessor.size, otherwise, images won't be resized.</span></span>
|
||||
<span id="cb1-1594"><a href="#cb1-1594" aria-hidden="true" tabindex="-1"></a><span class="fu">image_size</span><span class="kw">:</span><span class="at"> int | tuple[int, int] | None</span></span>
|
||||
<span id="cb1-1595"><a href="#cb1-1595" aria-hidden="true" tabindex="-1"></a><span class="co"># The resampling algorithm to use for image resizing. Default is bilinear. Please refer</span></span>
|
||||
<span id="cb1-1596"><a href="#cb1-1596" aria-hidden="true" tabindex="-1"></a><span class="co"># to PIL.Image.Resampling for more details.</span></span>
|
||||
<span id="cb1-1597"><a href="#cb1-1597" aria-hidden="true" tabindex="-1"></a><span class="fu">image_resize_algorithm</span><span class="kw">:</span><span class="at"> Literal['bilinear', 'bicubic', 'lanczos'] | Resampling | None</span></span>
|
||||
<span id="cb1-1598"><a href="#cb1-1598" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1599"><a href="#cb1-1599" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides to the base model configuration</span></span>
|
||||
<span id="cb1-1600"><a href="#cb1-1600" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1601"><a href="#cb1-1601" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides the base model loading from_pretrained</span></span>
|
||||
<span id="cb1-1602"><a href="#cb1-1602" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
|
||||
<span id="cb1-1603"><a href="#cb1-1603" aria-hidden="true" tabindex="-1"></a><span class="co"># If you want to specify the type of model to load, AutoModelForCausalLM is a good</span></span>
|
||||
<span id="cb1-1604"><a href="#cb1-1604" aria-hidden="true" tabindex="-1"></a><span class="co"># choice too</span></span>
|
||||
<span id="cb1-1605"><a href="#cb1-1605" aria-hidden="true" tabindex="-1"></a><span class="fu">type_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1606"><a href="#cb1-1606" aria-hidden="true" tabindex="-1"></a><span class="co"># You can specify to choose a specific model revision from huggingface hub</span></span>
|
||||
<span id="cb1-1607"><a href="#cb1-1607" aria-hidden="true" tabindex="-1"></a><span class="fu">revision_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1608"><a href="#cb1-1608" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb1-1609"><a href="#cb1-1609" aria-hidden="true" tabindex="-1"></a><span class="fu">max_packed_sequence_len</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1610"><a href="#cb1-1610" aria-hidden="true" tabindex="-1"></a><span class="fu">rope_scaling</span><span class="kw">:</span><span class="at"> Any | None</span></span>
|
||||
<span id="cb1-1611"><a href="#cb1-1611" aria-hidden="true" tabindex="-1"></a><span class="fu">noisy_embedding_alpha</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1612"><a href="#cb1-1612" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_beta</span><span class="kw">:</span><span class="at"> float | None</span></span>
|
||||
<span id="cb1-1613"><a href="#cb1-1613" aria-hidden="true" tabindex="-1"></a><span class="fu">evaluation_strategy</span><span class="kw">:</span><span class="at"> str | None</span></span>
|
||||
<span id="cb1-1614"><a href="#cb1-1614" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_table_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1615"><a href="#cb1-1615" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
|
||||
<span id="cb1-1616"><a href="#cb1-1616" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_use_logits_to_keep</span><span class="kw">:</span><span class="at"> bool | None</span></span>
|
||||
<span id="cb1-1617"><a href="#cb1-1617" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_generate_during_eval</span><span class="kw">:</span><span class="at"> bool | None</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
|
||||
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
463
docs/rlhf.html
463
docs/rlhf.html
@@ -813,6 +813,20 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<li><a href="#sequence-parallelism" id="toc-sequence-parallelism" class="nav-link" data-scroll-target="#sequence-parallelism">Sequence Parallelism</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#simpo" id="toc-simpo" class="nav-link" data-scroll-target="#simpo">SimPO</a></li>
|
||||
<li><a href="#ebft" id="toc-ebft" class="nav-link" data-scroll-target="#ebft">EBFT</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#structured-mode" id="toc-structured-mode" class="nav-link" data-scroll-target="#structured-mode">Structured Mode</a></li>
|
||||
<li><a href="#strided-mode" id="toc-strided-mode" class="nav-link" data-scroll-target="#strided-mode">Strided Mode</a></li>
|
||||
<li><a href="#ebft-configuration-reference" id="toc-ebft-configuration-reference" class="nav-link" data-scroll-target="#ebft-configuration-reference">EBFT Configuration Reference</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#nemo-gym-integration" id="toc-nemo-gym-integration" class="nav-link" data-scroll-target="#nemo-gym-integration">NeMo Gym Integration</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#single-turn-simplest" id="toc-single-turn-simplest" class="nav-link" data-scroll-target="#single-turn-simplest">Single-Turn (Simplest)</a></li>
|
||||
<li><a href="#multi-turn-with-async-grpo-recommended" id="toc-multi-turn-with-async-grpo-recommended" class="nav-link" data-scroll-target="#multi-turn-with-async-grpo-recommended">Multi-Turn with Async GRPO (Recommended)</a></li>
|
||||
<li><a href="#nemo-gym-prerequisites" id="toc-nemo-gym-prerequisites" class="nav-link" data-scroll-target="#nemo-gym-prerequisites">NeMo Gym Prerequisites</a></li>
|
||||
<li><a href="#nemo-gym-configuration-reference" id="toc-nemo-gym-configuration-reference" class="nav-link" data-scroll-target="#nemo-gym-configuration-reference">NeMo Gym Configuration Reference</a></li>
|
||||
<li><a href="#reward-functions-2" id="toc-reward-functions-2" class="nav-link" data-scroll-target="#reward-functions-2">Reward Functions</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#using-local-dataset-files" id="toc-using-local-dataset-files" class="nav-link" data-scroll-target="#using-local-dataset-files">Using local dataset files</a></li>
|
||||
<li><a href="#trl-auto-unwrapping-for-peft" id="toc-trl-auto-unwrapping-for-peft" class="nav-link" data-scroll-target="#trl-auto-unwrapping-for-peft">TRL auto-unwrapping for PEFT</a></li>
|
||||
</ul></li>
|
||||
@@ -857,6 +871,8 @@ feedback. Various methods include, but not limited to:</p>
|
||||
<li><a href="#orpo">Odds Ratio Preference Optimization (ORPO)</a></li>
|
||||
<li><a href="#grpo">Group Relative Policy Optimization (GRPO)</a></li>
|
||||
<li><a href="#gdpo">Group Reward-Decoupled Policy Optimization (GDPO)</a></li>
|
||||
<li><a href="#ebft">Energy-Based Fine-Tuning (EBFT)</a></li>
|
||||
<li><a href="#nemo-gym-integration">NeMo Gym Integration</a></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="rlhf-using-axolotl" class="level2">
|
||||
@@ -1805,20 +1821,451 @@ Tip
|
||||
<span id="cb64-4"><a href="#cb64-4" aria-hidden="true" tabindex="-1"></a><span class="fu">simpo_gamma</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.5</span><span class="co"> # default in CPOTrainer</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>This method uses the same dataset format as <a href="#dpo">DPO</a>.</p>
|
||||
</section>
|
||||
<section id="ebft" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="ebft">EBFT</h3>
|
||||
<p>EBFT (Energy-Based Fine-Tuning) fine-tunes language models by optimizing a <strong>feature-matching loss</strong> rather than relying on external reward functions. A frozen copy of the model extracts embeddings from both generated and ground-truth completions, and the generator is updated via REINFORCE to match the ground-truth feature moments.</p>
|
||||
<p>Paper: <a href="https://arxiv.org/abs/2603.12248">“Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models”</a> (Jelassi et al., 2026)</p>
|
||||
<p><strong>Key advantages:</strong></p>
|
||||
<ul>
|
||||
<li>No reward model or verifier required — works on any (prompt, completion) data</li>
|
||||
<li>Applicable to non-verifiable tasks (code, translation, creative writing)</li>
|
||||
<li>Operates on model rollouts (not teacher forcing), reducing distribution shift</li>
|
||||
</ul>
|
||||
<p>EBFT supports two modes:</p>
|
||||
<ul>
|
||||
<li><strong>Structured mode</strong>: For QA/instruction data with prompt + completion pairs. Uses vLLM for generation (like GRPO).</li>
|
||||
<li><strong>Strided mode</strong>: For unstructured text without prompt/completion splits. Uses strided block-parallel generation with flex_attention — no vLLM needed.</li>
|
||||
</ul>
|
||||
<section id="structured-mode" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="structured-mode">Structured Mode</h4>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb65"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb65-1"><a href="#cb65-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-4B</span></span>
|
||||
<span id="cb65-2"><a href="#cb65-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-3"><a href="#cb65-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> ebft</span></span>
|
||||
<span id="cb65-4"><a href="#cb65-4" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-5"><a href="#cb65-5" aria-hidden="true" tabindex="-1"></a><span class="fu">ebft</span><span class="kw">:</span></span>
|
||||
<span id="cb65-6"><a href="#cb65-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">feature_layers</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="fl">0.25</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.5</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.75</span><span class="kw">]</span><span class="co"> # Extract features at 25%, 50%, 75% depth</span></span>
|
||||
<span id="cb65-7"><a href="#cb65-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">embed_method</span><span class="kw">:</span><span class="at"> last_token</span></span>
|
||||
<span id="cb65-8"><a href="#cb65-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_whitening</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb65-9"><a href="#cb65-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">alignment_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span><span class="co"> # Cosine similarity reward weight</span></span>
|
||||
<span id="cb65-10"><a href="#cb65-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">diversity_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span><span class="co"> # Pairwise dot product penalty</span></span>
|
||||
<span id="cb65-11"><a href="#cb65-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ce_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.0</span><span class="co"> # Cross-entropy on GT tokens (0 = off)</span></span>
|
||||
<span id="cb65-12"><a href="#cb65-12" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-13"><a href="#cb65-13" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
|
||||
<span id="cb65-14"><a href="#cb65-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
|
||||
<span id="cb65-15"><a href="#cb65-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">256</span></span>
|
||||
<span id="cb65-16"><a href="#cb65-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.7</span></span>
|
||||
<span id="cb65-17"><a href="#cb65-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb65-18"><a href="#cb65-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_host</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.0.0.0</span></span>
|
||||
<span id="cb65-19"><a href="#cb65-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">8000</span></span>
|
||||
<span id="cb65-20"><a href="#cb65-20" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_lora_sync</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # LoRA adapter sync (recommended)</span></span>
|
||||
<span id="cb65-21"><a href="#cb65-21" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_sync_interval</span><span class="kw">:</span><span class="at"> </span><span class="dv">3</span></span>
|
||||
<span id="cb65-22"><a href="#cb65-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_data_producer</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb65-23"><a href="#cb65-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">async_prefetch</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Set false for sync mode</span></span>
|
||||
<span id="cb65-24"><a href="#cb65-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">scale_rewards</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb65-25"><a href="#cb65-25" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">loss_type</span><span class="kw">:</span><span class="at"> grpo</span></span>
|
||||
<span id="cb65-26"><a href="#cb65-26" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">epsilon</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.2</span></span>
|
||||
<span id="cb65-27"><a href="#cb65-27" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-28"><a href="#cb65-28" aria-hidden="true" tabindex="-1"></a><span class="fu">vllm</span><span class="kw">:</span></span>
|
||||
<span id="cb65-29"><a href="#cb65-29" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">gpu_memory_utilization</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.5</span></span>
|
||||
<span id="cb65-30"><a href="#cb65-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_model_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">2048</span></span>
|
||||
<span id="cb65-31"><a href="#cb65-31" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-32"><a href="#cb65-32" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb65-33"><a href="#cb65-33" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> nvidia/OpenCodeInstruct</span></span>
|
||||
<span id="cb65-34"><a href="#cb65-34" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> ebft_opencode.transform</span></span>
|
||||
<span id="cb65-35"><a href="#cb65-35" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train[:500]</span></span>
|
||||
<span id="cb65-36"><a href="#cb65-36" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb65-37"><a href="#cb65-37" aria-hidden="true" tabindex="-1"></a><span class="fu">adapter</span><span class="kw">:</span><span class="at"> lora</span></span>
|
||||
<span id="cb65-38"><a href="#cb65-38" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_r</span><span class="kw">:</span><span class="at"> </span><span class="dv">16</span></span>
|
||||
<span id="cb65-39"><a href="#cb65-39" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_alpha</span><span class="kw">:</span><span class="at"> </span><span class="dv">32</span></span>
|
||||
<span id="cb65-40"><a href="#cb65-40" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_linear</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb66"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb66-1"><a href="#cb66-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: Start vLLM</span></span>
|
||||
<span id="cb66-2"><a href="#cb66-2" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> vllm-serve config.yaml</span>
|
||||
<span id="cb66-3"><a href="#cb66-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb66-4"><a href="#cb66-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: Train</span></span>
|
||||
<span id="cb66-5"><a href="#cb66-5" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>1 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
</section>
|
||||
<section id="strided-mode" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="strided-mode">Strided Mode</h4>
|
||||
<p>For unstructured text (raw code, prose). No vLLM needed — runs on a single GPU.</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb67"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb67-1"><a href="#cb67-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> meta-llama/Llama-3.2-1B</span></span>
|
||||
<span id="cb67-2"><a href="#cb67-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb67-3"><a href="#cb67-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> ebft</span></span>
|
||||
<span id="cb67-4"><a href="#cb67-4" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb67-5"><a href="#cb67-5" aria-hidden="true" tabindex="-1"></a><span class="fu">ebft</span><span class="kw">:</span></span>
|
||||
<span id="cb67-6"><a href="#cb67-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">mode</span><span class="kw">:</span><span class="at"> strided</span></span>
|
||||
<span id="cb67-7"><a href="#cb67-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
|
||||
<span id="cb67-8"><a href="#cb67-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">context_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
|
||||
<span id="cb67-9"><a href="#cb67-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">generate_max_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
|
||||
<span id="cb67-10"><a href="#cb67-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">n_samples_per_prompt</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
|
||||
<span id="cb67-11"><a href="#cb67-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.6</span></span>
|
||||
<span id="cb67-12"><a href="#cb67-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">feature_layers</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="fl">0.25</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.5</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.75</span><span class="kw">]</span></span>
|
||||
<span id="cb67-13"><a href="#cb67-13" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">embed_method</span><span class="kw">:</span><span class="at"> last_token</span></span>
|
||||
<span id="cb67-14"><a href="#cb67-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_whitening</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb67-15"><a href="#cb67-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">alignment_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
|
||||
<span id="cb67-16"><a href="#cb67-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">diversity_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
|
||||
<span id="cb67-17"><a href="#cb67-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">rl_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
|
||||
<span id="cb67-18"><a href="#cb67-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ce_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.03</span></span>
|
||||
<span id="cb67-19"><a href="#cb67-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">advantage_estimator</span><span class="kw">:</span><span class="at"> rloo</span></span>
|
||||
<span id="cb67-20"><a href="#cb67-20" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb67-21"><a href="#cb67-21" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb67-22"><a href="#cb67-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> nvidia/OpenCodeInstruct</span></span>
|
||||
<span id="cb67-23"><a href="#cb67-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> ebft_strided_structured.transform</span></span>
|
||||
<span id="cb67-24"><a href="#cb67-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train[:1%]</span></span>
|
||||
<span id="cb67-25"><a href="#cb67-25" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb67-26"><a href="#cb67-26" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb67-27"><a href="#cb67-27" aria-hidden="true" tabindex="-1"></a><span class="fu">flex_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Strided mode uses flex_attention</span></span>
|
||||
<span id="cb67-28"><a href="#cb67-28" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb67-29"><a href="#cb67-29" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing_kwargs</span><span class="kw">:</span></span>
|
||||
<span id="cb67-30"><a href="#cb67-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_reentrant</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required for flex_attention</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb68"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb68-1"><a href="#cb68-1" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="callout callout-style-default callout-tip callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"></i>
|
||||
</div>
|
||||
<div class="callout-title-container flex-fill">
|
||||
Tip
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>See <code>examples/ebft/</code> for complete example configs covering Llama 1B/3B/8B and Qwen3 4B/8B models in both modes.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="ebft-configuration-reference" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="ebft-configuration-reference">EBFT Configuration Reference</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 33%">
|
||||
<col style="width: 27%">
|
||||
<col style="width: 39%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Parameter</th>
|
||||
<th>Default</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.feature_layers</code></td>
|
||||
<td><code>[0.25, 0.5, 0.75]</code></td>
|
||||
<td>Layer depths for feature extraction (fractional)</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.embed_method</code></td>
|
||||
<td><code>last_token</code></td>
|
||||
<td>Feature pooling: <code>last_token</code>, <code>mean_pooling</code>, <code>concat</code></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.use_whitening</code></td>
|
||||
<td><code>false</code></td>
|
||||
<td>SVD whitening of feature dimensions</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.alignment_coef</code></td>
|
||||
<td><code>1.0</code></td>
|
||||
<td>Cosine similarity reward weight</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.diversity_coef</code></td>
|
||||
<td><code>1.0</code></td>
|
||||
<td>Pairwise dot product penalty weight</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.ce_coef</code></td>
|
||||
<td><code>0.0</code></td>
|
||||
<td>Cross-entropy loss on ground-truth tokens</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.mode</code></td>
|
||||
<td><code>structured</code></td>
|
||||
<td><code>structured</code> (vLLM) or <code>strided</code> (no vLLM)</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.stride</code></td>
|
||||
<td>—</td>
|
||||
<td>Tokens between anchor points (strided mode)</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.context_length</code></td>
|
||||
<td>—</td>
|
||||
<td>Context window per block (strided mode)</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.generate_max_len</code></td>
|
||||
<td>—</td>
|
||||
<td>Tokens to generate per block (strided mode)</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>ebft.n_samples_per_prompt</code></td>
|
||||
<td>—</td>
|
||||
<td>Rollouts per document (strided mode)</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>ebft.advantage_estimator</code></td>
|
||||
<td><code>grpo</code></td>
|
||||
<td><code>grpo</code> or <code>rloo</code> (strided mode)</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
</section>
|
||||
<section id="nemo-gym-integration" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="nemo-gym-integration">NeMo Gym Integration</h3>
|
||||
<p><a href="https://github.com/NVIDIA-NeMo/Gym">NeMo Gym</a> provides 50+ verified RL environments (math, coding, tool-use, reasoning) with deterministic reward signals. The axolotl integration supports both <strong>single-turn</strong> (call <code>/verify</code> after generation) and <strong>multi-turn</strong> (agent-based tool execution via <code>/run</code>).</p>
|
||||
<section id="single-turn-simplest" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="single-turn-simplest">Single-Turn (Simplest)</h4>
|
||||
<p>For environments that only need answer verification (math, coding challenges). No agent server needed — the reward function calls <code>/verify</code> directly on the resource server.</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb69"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb69-1"><a href="#cb69-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2.5-0.5B-Instruct</span></span>
|
||||
<span id="cb69-2"><a href="#cb69-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb69-3"><a href="#cb69-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> grpo</span></span>
|
||||
<span id="cb69-4"><a href="#cb69-4" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> tokenizer_default</span></span>
|
||||
<span id="cb69-5"><a href="#cb69-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb69-6"><a href="#cb69-6" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
|
||||
<span id="cb69-7"><a href="#cb69-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span><span class="co"> # Colocate mode (single GPU)</span></span>
|
||||
<span id="cb69-8"><a href="#cb69-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
|
||||
<span id="cb69-9"><a href="#cb69-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">128</span></span>
|
||||
<span id="cb69-10"><a href="#cb69-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.9</span></span>
|
||||
<span id="cb69-11"><a href="#cb69-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reward_funcs</span><span class="kw">:</span></span>
|
||||
<span id="cb69-12"><a href="#cb69-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify</span></span>
|
||||
<span id="cb69-13"><a href="#cb69-13" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb69-14"><a href="#cb69-14" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||
<span id="cb69-15"><a href="#cb69-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.NemoGymPlugin</span></span>
|
||||
<span id="cb69-16"><a href="#cb69-16" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb69-17"><a href="#cb69-17" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_enabled</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb69-18"><a href="#cb69-18" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_dir</span><span class="kw">:</span><span class="at"> ~/Gym</span></span>
|
||||
<span id="cb69-19"><a href="#cb69-19" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_auto_start</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb69-20"><a href="#cb69-20" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_head_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">11000</span></span>
|
||||
<span id="cb69-21"><a href="#cb69-21" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb69-22"><a href="#cb69-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> resources_servers/reasoning_gym/data/train_basic_arithmetic.jsonl</span></span>
|
||||
<span id="cb69-23"><a href="#cb69-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">server_name</span><span class="kw">:</span><span class="at"> reasoning_gym</span></span>
|
||||
<span id="cb69-24"><a href="#cb69-24" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb69-25"><a href="#cb69-25" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb69-26"><a href="#cb69-26" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/Gym/resources_servers/reasoning_gym/data/train_basic_arithmetic.jsonl</span></span>
|
||||
<span id="cb69-27"><a href="#cb69-27" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chat_template</span></span>
|
||||
<span id="cb69-28"><a href="#cb69-28" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">field_messages</span><span class="kw">:</span><span class="at"> responses_create_params.input</span></span>
|
||||
<span id="cb69-29"><a href="#cb69-29" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_content</span><span class="kw">:</span><span class="at"> content</span></span>
|
||||
<span id="cb69-30"><a href="#cb69-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_role</span><span class="kw">:</span><span class="at"> role</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb70"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb70-1"><a href="#cb70-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: Start NeMo Gym resource server</span></span>
|
||||
<span id="cb70-2"><a href="#cb70-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym <span class="kw">&&</span> <span class="ex">.venv/bin/ng_run</span> <span class="dt">\</span></span>
|
||||
<span id="cb70-3"><a href="#cb70-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"+config_paths=[resources_servers/reasoning_gym/configs/resources_only.yaml]"</span> <span class="dt">\</span></span>
|
||||
<span id="cb70-4"><a href="#cb70-4" aria-hidden="true" tabindex="-1"></a> <span class="st">"+skip_venv_if_present=true"</span></span>
|
||||
<span id="cb70-5"><a href="#cb70-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb70-6"><a href="#cb70-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: Train</span></span>
|
||||
<span id="cb70-7"><a href="#cb70-7" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="callout callout-style-default callout-note callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"></i>
|
||||
</div>
|
||||
<div class="callout-title-container flex-fill">
|
||||
Note
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p><code>nemo_gym_datasets.path</code> is relative to <code>nemo_gym_dir</code>. Don’t use absolute paths or they will be double-joined.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="multi-turn-with-async-grpo-recommended" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="multi-turn-with-async-grpo-recommended">Multi-Turn with Async GRPO (Recommended)</h4>
|
||||
<p>For environments with tool-use (weather, search, databases). An agent server orchestrates multi-turn interactions: generate → parse tool calls → execute tools → feed results back → repeat until done.</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb71"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb71-1"><a href="#cb71-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-0.6B</span></span>
|
||||
<span id="cb71-2"><a href="#cb71-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-3"><a href="#cb71-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> grpo</span></span>
|
||||
<span id="cb71-4"><a href="#cb71-4" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> tokenizer_default</span></span>
|
||||
<span id="cb71-5"><a href="#cb71-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-6"><a href="#cb71-6" aria-hidden="true" tabindex="-1"></a><span class="fu">adapter</span><span class="kw">:</span><span class="at"> lora</span></span>
|
||||
<span id="cb71-7"><a href="#cb71-7" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_r</span><span class="kw">:</span><span class="at"> </span><span class="dv">16</span></span>
|
||||
<span id="cb71-8"><a href="#cb71-8" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_alpha</span><span class="kw">:</span><span class="at"> </span><span class="dv">32</span></span>
|
||||
<span id="cb71-9"><a href="#cb71-9" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_modules</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="at">q_proj</span><span class="kw">,</span><span class="at"> k_proj</span><span class="kw">,</span><span class="at"> v_proj</span><span class="kw">,</span><span class="at"> o_proj</span><span class="kw">,</span><span class="at"> gate_proj</span><span class="kw">,</span><span class="at"> up_proj</span><span class="kw">,</span><span class="at"> down_proj</span><span class="kw">]</span></span>
|
||||
<span id="cb71-10"><a href="#cb71-10" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-11"><a href="#cb71-11" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
|
||||
<span id="cb71-12"><a href="#cb71-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb71-13"><a href="#cb71-13" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_mode</span><span class="kw">:</span><span class="at"> server</span></span>
|
||||
<span id="cb71-14"><a href="#cb71-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_host</span><span class="kw">:</span><span class="at"> localhost</span></span>
|
||||
<span id="cb71-15"><a href="#cb71-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">8000</span></span>
|
||||
<span id="cb71-16"><a href="#cb71-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_lora_sync</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb71-17"><a href="#cb71-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_sync_interval</span><span class="kw">:</span><span class="at"> </span><span class="dv">5</span></span>
|
||||
<span id="cb71-18"><a href="#cb71-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_data_producer</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb71-19"><a href="#cb71-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">async_prefetch</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # 3x speedup</span></span>
|
||||
<span id="cb71-20"><a href="#cb71-20" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
|
||||
<span id="cb71-21"><a href="#cb71-21" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">512</span></span>
|
||||
<span id="cb71-22"><a href="#cb71-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.8</span></span>
|
||||
<span id="cb71-23"><a href="#cb71-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reward_funcs</span><span class="kw">:</span></span>
|
||||
<span id="cb71-24"><a href="#cb71-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.rewards.reward_env</span></span>
|
||||
<span id="cb71-25"><a href="#cb71-25" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-26"><a href="#cb71-26" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||
<span id="cb71-27"><a href="#cb71-27" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.NemoGymPlugin</span></span>
|
||||
<span id="cb71-28"><a href="#cb71-28" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-29"><a href="#cb71-29" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_enabled</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb71-30"><a href="#cb71-30" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_auto_start</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb71-31"><a href="#cb71-31" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_head_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">11000</span></span>
|
||||
<span id="cb71-32"><a href="#cb71-32" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_multi_turn</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb71-33"><a href="#cb71-33" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_verify_timeout</span><span class="kw">:</span><span class="at"> </span><span class="dv">120</span></span>
|
||||
<span id="cb71-34"><a href="#cb71-34" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb71-35"><a href="#cb71-35" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> resources_servers/example_single_tool_call/data/weather_tool_calling.jsonl</span></span>
|
||||
<span id="cb71-36"><a href="#cb71-36" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">server_name</span><span class="kw">:</span><span class="at"> example_single_tool_call</span></span>
|
||||
<span id="cb71-37"><a href="#cb71-37" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-38"><a href="#cb71-38" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb71-39"><a href="#cb71-39" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/Gym/resources_servers/example_single_tool_call/data/weather_tool_calling.jsonl</span></span>
|
||||
<span id="cb71-40"><a href="#cb71-40" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chat_template</span></span>
|
||||
<span id="cb71-41"><a href="#cb71-41" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">field_messages</span><span class="kw">:</span><span class="at"> responses_create_params.input</span></span>
|
||||
<span id="cb71-42"><a href="#cb71-42" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_content</span><span class="kw">:</span><span class="at"> content</span></span>
|
||||
<span id="cb71-43"><a href="#cb71-43" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_role</span><span class="kw">:</span><span class="at"> role</span></span>
|
||||
<span id="cb71-44"><a href="#cb71-44" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb71-45"><a href="#cb71-45" aria-hidden="true" tabindex="-1"></a><span class="fu">vllm</span><span class="kw">:</span></span>
|
||||
<span id="cb71-46"><a href="#cb71-46" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">gpu_memory_utilization</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.85</span></span>
|
||||
<span id="cb71-47"><a href="#cb71-47" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_model_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">2048</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>Multi-turn requires three services running:</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb72"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb72-1"><a href="#cb72-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: vLLM with LoRA + tool calling</span></span>
|
||||
<span id="cb72-2"><a href="#cb72-2" aria-hidden="true" tabindex="-1"></a><span class="va">VLLM_ALLOW_RUNTIME_LORA_UPDATING</span><span class="op">=</span>1 <span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="dt">\</span></span>
|
||||
<span id="cb72-3"><a href="#cb72-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">python</span> <span class="at">-m</span> vllm.entrypoints.openai.api_server <span class="dt">\</span></span>
|
||||
<span id="cb72-4"><a href="#cb72-4" aria-hidden="true" tabindex="-1"></a> <span class="at">--model</span> Qwen/Qwen3-0.6B <span class="at">--max-model-len</span> 2048 <span class="dt">\</span></span>
|
||||
<span id="cb72-5"><a href="#cb72-5" aria-hidden="true" tabindex="-1"></a> <span class="at">--gpu-memory-utilization</span> 0.85 <span class="dt">\</span></span>
|
||||
<span id="cb72-6"><a href="#cb72-6" aria-hidden="true" tabindex="-1"></a> <span class="at">--enable-lora</span> <span class="at">--max-lora-rank</span> 64 <span class="dt">\</span></span>
|
||||
<span id="cb72-7"><a href="#cb72-7" aria-hidden="true" tabindex="-1"></a> <span class="at">--enable-auto-tool-choice</span> <span class="at">--tool-call-parser</span> hermes</span>
|
||||
<span id="cb72-8"><a href="#cb72-8" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb72-9"><a href="#cb72-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: NeMo Gym servers (resource + model proxy + agent)</span></span>
|
||||
<span id="cb72-10"><a href="#cb72-10" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym <span class="kw">&&</span> <span class="ex">.venv/bin/ng_run</span> <span class="dt">\</span></span>
|
||||
<span id="cb72-11"><a href="#cb72-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"+config_paths=[configs/axolotl_tool_calling.yaml]"</span> <span class="dt">\</span></span>
|
||||
<span id="cb72-12"><a href="#cb72-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"+skip_venv_if_present=true"</span></span>
|
||||
<span id="cb72-13"><a href="#cb72-13" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb72-14"><a href="#cb72-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 3: Training</span></span>
|
||||
<span id="cb72-15"><a href="#cb72-15" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>1 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="callout callout-style-default callout-important callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"></i>
|
||||
</div>
|
||||
<div class="callout-title-container flex-fill">
|
||||
Important
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>Multi-turn requires a NeMo Gym agent config YAML that defines three components: a resource server (tools + <code>/verify</code>), a model server proxy (forwards to your vLLM), and an agent server (orchestrates <code>/run</code>). See the <a href="https://github.com/NVIDIA-NeMo/Gym">NeMo Gym README</a> for agent config format.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="nemo-gym-prerequisites" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="nemo-gym-prerequisites">NeMo Gym Prerequisites</h4>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb73"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb73-1"><a href="#cb73-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Clone and set up NeMo Gym</span></span>
|
||||
<span id="cb73-2"><a href="#cb73-2" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://github.com/NVIDIA-NeMo/Gym.git ~/Gym</span>
|
||||
<span id="cb73-3"><a href="#cb73-3" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym</span>
|
||||
<span id="cb73-4"><a href="#cb73-4" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> venv <span class="at">--python</span> 3.12 <span class="kw">&&</span> <span class="bu">source</span> .venv/bin/activate <span class="kw">&&</span> <span class="ex">uv</span> sync</span>
|
||||
<span id="cb73-5"><a href="#cb73-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb73-6"><a href="#cb73-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Fix pycosat build (GCC 13+)</span></span>
|
||||
<span id="cb73-7"><a href="#cb73-7" aria-hidden="true" tabindex="-1"></a><span class="va">CFLAGS</span><span class="op">=</span><span class="st">""</span> <span class="ex">uv</span> pip install pycosat <span class="at">--python</span> .venv/bin/python <span class="at">--no-build-isolation</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
</section>
|
||||
<section id="nemo-gym-configuration-reference" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="nemo-gym-configuration-reference">NeMo Gym Configuration Reference</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 28%">
|
||||
<col style="width: 15%">
|
||||
<col style="width: 23%">
|
||||
<col style="width: 33%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Parameter</th>
|
||||
<th>Type</th>
|
||||
<th>Default</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><code>nemo_gym_enabled</code></td>
|
||||
<td>bool</td>
|
||||
<td>—</td>
|
||||
<td>Enable the NeMo Gym integration</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>nemo_gym_dir</code></td>
|
||||
<td>str</td>
|
||||
<td><code>~/Gym</code></td>
|
||||
<td>Path to NeMo Gym repo</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>nemo_gym_auto_start</code></td>
|
||||
<td>bool</td>
|
||||
<td><code>true</code></td>
|
||||
<td>Auto-start resource servers</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>nemo_gym_head_port</code></td>
|
||||
<td>int</td>
|
||||
<td><code>11000</code></td>
|
||||
<td>Head server port</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>nemo_gym_multi_turn</code></td>
|
||||
<td>bool</td>
|
||||
<td><code>false</code></td>
|
||||
<td>Enable multi-turn via agent <code>/run</code></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>nemo_gym_verify_timeout</code></td>
|
||||
<td>int</td>
|
||||
<td><code>30</code></td>
|
||||
<td>Per-request timeout (seconds)</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><code>nemo_gym_datasets</code></td>
|
||||
<td>list</td>
|
||||
<td>required</td>
|
||||
<td>Dataset configs with <code>path</code> and <code>server_name</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
<section id="reward-functions-2" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="reward-functions-2">Reward Functions</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 34%">
|
||||
<col style="width: 20%">
|
||||
<col style="width: 44%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Function</th>
|
||||
<th>Mode</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><code>axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify</code></td>
|
||||
<td>Single-turn</td>
|
||||
<td>Calls <code>/verify</code>, returns binary reward</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><code>axolotl.integrations.nemo_gym.rewards.reward_env</code></td>
|
||||
<td>Multi-turn</td>
|
||||
<td>Passthrough reward from agent <code>/run</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
</section>
|
||||
<section id="using-local-dataset-files" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="using-local-dataset-files">Using local dataset files</h3>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb65"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb65-1"><a href="#cb65-1" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb65-2"><a href="#cb65-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">ds_type</span><span class="kw">:</span><span class="at"> json</span></span>
|
||||
<span id="cb65-3"><a href="#cb65-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">data_files</span><span class="kw">:</span></span>
|
||||
<span id="cb65-4"><a href="#cb65-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> orca_rlhf.jsonl</span></span>
|
||||
<span id="cb65-5"><a href="#cb65-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train</span></span>
|
||||
<span id="cb65-6"><a href="#cb65-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chatml.intel</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb74"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb74-1"><a href="#cb74-1" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
|
||||
<span id="cb74-2"><a href="#cb74-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">ds_type</span><span class="kw">:</span><span class="at"> json</span></span>
|
||||
<span id="cb74-3"><a href="#cb74-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">data_files</span><span class="kw">:</span></span>
|
||||
<span id="cb74-4"><a href="#cb74-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> orca_rlhf.jsonl</span></span>
|
||||
<span id="cb74-5"><a href="#cb74-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train</span></span>
|
||||
<span id="cb74-6"><a href="#cb74-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chatml.intel</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
</section>
|
||||
<section id="trl-auto-unwrapping-for-peft" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="trl-auto-unwrapping-for-peft">TRL auto-unwrapping for PEFT</h3>
|
||||
<p>TRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb66"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb66-1"><a href="#cb66-1" aria-hidden="true" tabindex="-1"></a><span class="co"># load ref model when adapter training.</span></span>
|
||||
<span id="cb66-2"><a href="#cb66-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rl_adapter_ref_model</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb75"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb75-1"><a href="#cb75-1" aria-hidden="true" tabindex="-1"></a><span class="co"># load ref model when adapter training.</span></span>
|
||||
<span id="cb75-2"><a href="#cb75-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rl_adapter_ref_model</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
23
search.json
23
search.json
File diff suppressed because one or more lines are too long
470
sitemap.xml
470
sitemap.xml
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user