Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2026-03-25 12:49:03 +00:00
parent 4aa7721a0c
commit b9fe1393d6
7 changed files with 1836 additions and 821 deletions

View File

@@ -1 +1 @@
c438397e
fc32723f

View File

@@ -793,7 +793,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
</tr>
<tr class="even">
<td><a href="#axolotl.cli.merge_lora.do_merge_lora">do_merge_lora</a></td>
<td>Calls <code>transformers</code> <code>merge_and_unload</code> on the model given in the <code>axolotl</code> config</td>
<td>Merges LoRA adapters with base model using either memory-efficient or legacy approach.</td>
</tr>
</tbody>
</table>
@@ -864,8 +864,7 @@ config values will be overwritten to allow the LoRA merge logic to work as expec
<section id="axolotl.cli.merge_lora.do_merge_lora" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.cli.merge_lora.do_merge_lora">do_merge_lora</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>cli.merge_lora.do_merge_lora(cfg)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Calls <code>transformers</code> <code>merge_and_unload</code> on the model given in the <code>axolotl</code> config
along with the LoRA adapters to combine them into a single base model.</p>
<p>Merges LoRA adapters with base model using either memory-efficient or legacy approach.</p>
<section id="parameters-1" class="level4 doc-section doc-section-parameters">
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
<table class="caption-top table">

View File

@@ -2191,211 +2191,214 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<span id="cb1-1407"><a href="#cb1-1407" aria-hidden="true" tabindex="-1"></a><span class="fu">loraplus_lr_embedding</span><span class="kw">:</span><span class="at"> float | None = 1e-06</span></span>
<span id="cb1-1408"><a href="#cb1-1408" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1409"><a href="#cb1-1409" aria-hidden="true" tabindex="-1"></a><span class="fu">merge_lora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1410"><a href="#cb1-1410" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1411"><a href="#cb1-1411" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to use ReLoRA. Use with jagged_restart_*steps options.</span></span>
<span id="cb1-1412"><a href="#cb1-1412" aria-hidden="true" tabindex="-1"></a><span class="fu">relora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1413"><a href="#cb1-1413" aria-hidden="true" tabindex="-1"></a><span class="co"># threshold for optimizer magnitude when pruning</span></span>
<span id="cb1-1414"><a href="#cb1-1414" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_prune_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1415"><a href="#cb1-1415" aria-hidden="true" tabindex="-1"></a><span class="co"># True to perform lora weight merges on cpu during restarts, for modest gpu memory</span></span>
<span id="cb1-1416"><a href="#cb1-1416" aria-hidden="true" tabindex="-1"></a><span class="co"># savings</span></span>
<span id="cb1-1417"><a href="#cb1-1417" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_cpu_offload</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1418"><a href="#cb1-1418" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1419"><a href="#cb1-1419" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to reset for jagged restarts</span></span>
<span id="cb1-1420"><a href="#cb1-1420" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1421"><a href="#cb1-1421" aria-hidden="true" tabindex="-1"></a><span class="co"># how many warmup steps to take after reset for jagged restarts</span></span>
<span id="cb1-1422"><a href="#cb1-1422" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_warmup_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1423"><a href="#cb1-1423" aria-hidden="true" tabindex="-1"></a><span class="co"># how many anneal steps to take before reset for jagged restarts</span></span>
<span id="cb1-1424"><a href="#cb1-1424" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_anneal_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1425"><a href="#cb1-1425" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1426"><a href="#cb1-1426" aria-hidden="true" tabindex="-1"></a><span class="co"># If greater than 1, backpropagation will be skipped and the gradients will be</span></span>
<span id="cb1-1427"><a href="#cb1-1427" aria-hidden="true" tabindex="-1"></a><span class="co"># accumulated for the given number of steps.</span></span>
<span id="cb1-1428"><a href="#cb1-1428" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_accumulation_steps</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1429"><a href="#cb1-1429" aria-hidden="true" tabindex="-1"></a><span class="co"># The number of samples to include in each batch. This is the number of samples sent to</span></span>
<span id="cb1-1430"><a href="#cb1-1430" aria-hidden="true" tabindex="-1"></a><span class="co"># each GPU. Batch size per gpu = micro_batch_size * gradient_accumulation_steps</span></span>
<span id="cb1-1431"><a href="#cb1-1431" aria-hidden="true" tabindex="-1"></a><span class="fu">micro_batch_size</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1432"><a href="#cb1-1432" aria-hidden="true" tabindex="-1"></a><span class="co"># Total batch size, we do not recommended setting this manually</span></span>
<span id="cb1-1433"><a href="#cb1-1433" aria-hidden="true" tabindex="-1"></a><span class="fu">batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1434"><a href="#cb1-1434" aria-hidden="true" tabindex="-1"></a><span class="co"># per gpu micro batch size for evals, defaults to value of micro_batch_size</span></span>
<span id="cb1-1435"><a href="#cb1-1435" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1436"><a href="#cb1-1436" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1437"><a href="#cb1-1437" aria-hidden="true" tabindex="-1"></a><span class="co"># whether to find batch size that fits in memory. Passed to underlying transformers</span></span>
<span id="cb1-1438"><a href="#cb1-1438" aria-hidden="true" tabindex="-1"></a><span class="co"># Trainer</span></span>
<span id="cb1-1439"><a href="#cb1-1439" aria-hidden="true" tabindex="-1"></a><span class="fu">auto_find_batch_size</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1440"><a href="#cb1-1440" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1441"><a href="#cb1-1441" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to mask out or include the human's prompt from the training labels</span></span>
<span id="cb1-1442"><a href="#cb1-1442" aria-hidden="true" tabindex="-1"></a><span class="fu">train_on_inputs</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
<span id="cb1-1443"><a href="#cb1-1443" aria-hidden="true" tabindex="-1"></a><span class="co"># Group similarly sized data to minimize padding. May be slower to start, as it must</span></span>
<span id="cb1-1444"><a href="#cb1-1444" aria-hidden="true" tabindex="-1"></a><span class="co"># download and sort the entire dataset. Note that training loss may have an oscillating</span></span>
<span id="cb1-1445"><a href="#cb1-1445" aria-hidden="true" tabindex="-1"></a><span class="co"># pattern with this enabled.</span></span>
<span id="cb1-1446"><a href="#cb1-1446" aria-hidden="true" tabindex="-1"></a><span class="fu">group_by_length</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1447"><a href="#cb1-1447" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1448"><a href="#cb1-1448" aria-hidden="true" tabindex="-1"></a><span class="fu">learning_rate</span><span class="kw">:</span><span class="at"> str | float (required)</span></span>
<span id="cb1-1449"><a href="#cb1-1449" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1450"><a href="#cb1-1450" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr_scale</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1451"><a href="#cb1-1451" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify weight decay</span></span>
<span id="cb1-1452"><a href="#cb1-1452" aria-hidden="true" tabindex="-1"></a><span class="fu">weight_decay</span><span class="kw">:</span><span class="at"> float | None = 0.0</span></span>
<span id="cb1-1453"><a href="#cb1-1453" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify optimizer</span></span>
<span id="cb1-1454"><a href="#cb1-1454" aria-hidden="true" tabindex="-1"></a><span class="fu">optimizer</span><span class="kw">:</span><span class="at"> OptimizerNames | CustomSupportedOptimizers | None = OptimizerNames.ADAMW_TORCH_FUSED</span></span>
<span id="cb1-1455"><a href="#cb1-1455" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary of arguments to pass to the optimizer</span></span>
<span id="cb1-1456"><a href="#cb1-1456" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_args</span><span class="kw">:</span><span class="at"> str | dict[str, Any] | None</span></span>
<span id="cb1-1457"><a href="#cb1-1457" aria-hidden="true" tabindex="-1"></a><span class="co"># The target modules to optimize, i.e. the module names that you would like to train,</span></span>
<span id="cb1-1458"><a href="#cb1-1458" aria-hidden="true" tabindex="-1"></a><span class="co"># right now this is used only for GaLore algorithm</span></span>
<span id="cb1-1459"><a href="#cb1-1459" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_target_modules</span><span class="kw">:</span><span class="at"> list[str] | Literal['all_linear'] | None</span></span>
<span id="cb1-1460"><a href="#cb1-1460" aria-hidden="true" tabindex="-1"></a><span class="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
<span id="cb1-1461"><a href="#cb1-1461" aria-hidden="true" tabindex="-1"></a><span class="fu">torchdistx_path</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1462"><a href="#cb1-1462" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler</span><span class="kw">:</span><span class="at"> SchedulerType | Literal['one_cycle'] | Literal['rex'] | None = SchedulerType.COSINE</span></span>
<span id="cb1-1463"><a href="#cb1-1463" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify a scheduler and kwargs to use with the optimizer</span></span>
<span id="cb1-1464"><a href="#cb1-1464" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1465"><a href="#cb1-1465" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_quadratic_warmup</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1466"><a href="#cb1-1466" aria-hidden="true" tabindex="-1"></a><span class="co"># decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of</span></span>
<span id="cb1-1467"><a href="#cb1-1467" aria-hidden="true" tabindex="-1"></a><span class="co"># peak lr</span></span>
<span id="cb1-1468"><a href="#cb1-1468" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_min_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1469"><a href="#cb1-1469" aria-hidden="true" tabindex="-1"></a><span class="co"># freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means</span></span>
<span id="cb1-1470"><a href="#cb1-1470" aria-hidden="true" tabindex="-1"></a><span class="co"># start cosine_min_lr at 80% of training step</span></span>
<span id="cb1-1471"><a href="#cb1-1471" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_constant_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1472"><a href="#cb1-1472" aria-hidden="true" tabindex="-1"></a><span class="co"># Learning rate div factor</span></span>
<span id="cb1-1473"><a href="#cb1-1473" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_div_factor</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1474"><a href="#cb1-1474" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1475"><a href="#cb1-1475" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_groups</span><span class="kw">:</span><span class="at"> list[LrGroup] | None</span></span>
<span id="cb1-1476"><a href="#cb1-1476" aria-hidden="true" tabindex="-1"></a><span class="co"> # For LrGroup:</span></span>
<span id="cb1-1477"><a href="#cb1-1477" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> str (required)</span></span>
<span id="cb1-1478"><a href="#cb1-1478" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">modules</span><span class="kw">:</span><span class="at"> list[str] (required)</span></span>
<span id="cb1-1479"><a href="#cb1-1479" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">lr</span><span class="kw">:</span><span class="at"> float (required)</span></span>
<span id="cb1-1480"><a href="#cb1-1480" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1481"><a href="#cb1-1481" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1482"><a href="#cb1-1482" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1483"><a href="#cb1-1483" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
<span id="cb1-1484"><a href="#cb1-1484" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon2</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1485"><a href="#cb1-1485" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1486"><a href="#cb1-1486" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta1</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1487"><a href="#cb1-1487" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1488"><a href="#cb1-1488" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta2</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1489"><a href="#cb1-1489" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
<span id="cb1-1490"><a href="#cb1-1490" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta3</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1491"><a href="#cb1-1491" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1492"><a href="#cb1-1492" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer learning rate</span></span>
<span id="cb1-1493"><a href="#cb1-1493" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1494"><a href="#cb1-1494" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer momentum</span></span>
<span id="cb1-1495"><a href="#cb1-1495" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_momentum</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1496"><a href="#cb1-1496" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: r/d fraction for low-rank approximation. Used to compute the low-rank</span></span>
<span id="cb1-1497"><a href="#cb1-1497" aria-hidden="true" tabindex="-1"></a><span class="co"># dimension.</span></span>
<span id="cb1-1498"><a href="#cb1-1498" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_fraction</span><span class="kw">:</span><span class="at"> float | None = 1.0</span></span>
<span id="cb1-1499"><a href="#cb1-1499" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: Round up the low-rank dimension to a multiple of this number. This may</span></span>
<span id="cb1-1500"><a href="#cb1-1500" aria-hidden="true" tabindex="-1"></a><span class="co"># be useful to ensure even sharding.</span></span>
<span id="cb1-1501"><a href="#cb1-1501" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_multiple_of</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1502"><a href="#cb1-1502" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1503"><a href="#cb1-1503" aria-hidden="true" tabindex="-1"></a><span class="co"># Gradient clipping max norm</span></span>
<span id="cb1-1504"><a href="#cb1-1504" aria-hidden="true" tabindex="-1"></a><span class="fu">max_grad_norm</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1505"><a href="#cb1-1505" aria-hidden="true" tabindex="-1"></a><span class="fu">num_epochs</span><span class="kw">:</span><span class="at"> float = 1.0</span></span>
<span id="cb1-1506"><a href="#cb1-1506" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1507"><a href="#cb1-1507" aria-hidden="true" tabindex="-1"></a><span class="fu">use_wandb</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1508"><a href="#cb1-1508" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your wandb run</span></span>
<span id="cb1-1509"><a href="#cb1-1509" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1510"><a href="#cb1-1510" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the ID of your wandb run</span></span>
<span id="cb1-1511"><a href="#cb1-1511" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_run_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1512"><a href="#cb1-1512" aria-hidden="true" tabindex="-1"></a><span class="co"># "offline" to save run metadata locally and not sync to the server, "disabled" to turn</span></span>
<span id="cb1-1513"><a href="#cb1-1513" aria-hidden="true" tabindex="-1"></a><span class="co"># off wandb</span></span>
<span id="cb1-1514"><a href="#cb1-1514" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1515"><a href="#cb1-1515" aria-hidden="true" tabindex="-1"></a><span class="co"># Your wandb project name</span></span>
<span id="cb1-1516"><a href="#cb1-1516" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_project</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1517"><a href="#cb1-1517" aria-hidden="true" tabindex="-1"></a><span class="co"># A wandb Team name if using a Team</span></span>
<span id="cb1-1518"><a href="#cb1-1518" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_entity</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1519"><a href="#cb1-1519" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_watch</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1520"><a href="#cb1-1520" aria-hidden="true" tabindex="-1"></a><span class="co"># "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only</span></span>
<span id="cb1-1521"><a href="#cb1-1521" aria-hidden="true" tabindex="-1"></a><span class="co"># at the end of training</span></span>
<span id="cb1-1522"><a href="#cb1-1522" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_log_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1523"><a href="#cb1-1523" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1524"><a href="#cb1-1524" aria-hidden="true" tabindex="-1"></a><span class="fu">use_mlflow</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1525"><a href="#cb1-1525" aria-hidden="true" tabindex="-1"></a><span class="co"># URI to mlflow</span></span>
<span id="cb1-1526"><a href="#cb1-1526" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_tracking_uri</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1527"><a href="#cb1-1527" aria-hidden="true" tabindex="-1"></a><span class="co"># Your experiment name</span></span>
<span id="cb1-1528"><a href="#cb1-1528" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_experiment_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1529"><a href="#cb1-1529" aria-hidden="true" tabindex="-1"></a><span class="co"># Your run name</span></span>
<span id="cb1-1530"><a href="#cb1-1530" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1531"><a href="#cb1-1531" aria-hidden="true" tabindex="-1"></a><span class="co"># set to true to copy each saved checkpoint on each save to mlflow artifact registry</span></span>
<span id="cb1-1532"><a href="#cb1-1532" aria-hidden="true" tabindex="-1"></a><span class="fu">hf_mlflow_log_artifacts</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1533"><a href="#cb1-1533" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1534"><a href="#cb1-1534" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable or disable Comet integration.</span></span>
<span id="cb1-1535"><a href="#cb1-1535" aria-hidden="true" tabindex="-1"></a><span class="fu">use_comet</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1536"><a href="#cb1-1536" aria-hidden="true" tabindex="-1"></a><span class="co"># API key for Comet. Recommended to set via `comet login`.</span></span>
<span id="cb1-1537"><a href="#cb1-1537" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_api_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1538"><a href="#cb1-1538" aria-hidden="true" tabindex="-1"></a><span class="co"># Workspace name in Comet. Defaults to the user's default workspace.</span></span>
<span id="cb1-1539"><a href="#cb1-1539" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_workspace</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1540"><a href="#cb1-1540" aria-hidden="true" tabindex="-1"></a><span class="co"># Project name in Comet. Defaults to Uncategorized.</span></span>
<span id="cb1-1541"><a href="#cb1-1541" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1542"><a href="#cb1-1542" aria-hidden="true" tabindex="-1"></a><span class="co"># Identifier for the experiment. Used to append data to an existing experiment or</span></span>
<span id="cb1-1543"><a href="#cb1-1543" aria-hidden="true" tabindex="-1"></a><span class="co"># control the key of new experiments. Default to a random key.</span></span>
<span id="cb1-1544"><a href="#cb1-1544" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1545"><a href="#cb1-1545" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new experiment ("create") or log to an existing one ("get"). Default</span></span>
<span id="cb1-1546"><a href="#cb1-1546" aria-hidden="true" tabindex="-1"></a><span class="co"># ("get_or_create") auto-selects based on configuration.</span></span>
<span id="cb1-1547"><a href="#cb1-1547" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1548"><a href="#cb1-1548" aria-hidden="true" tabindex="-1"></a><span class="co"># Set to True to log data to Comet server, or False for offline storage. Default is</span></span>
<span id="cb1-1549"><a href="#cb1-1549" aria-hidden="true" tabindex="-1"></a><span class="co"># True.</span></span>
<span id="cb1-1550"><a href="#cb1-1550" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_online</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1551"><a href="#cb1-1551" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary for additional configuration settings, see the doc for more details.</span></span>
<span id="cb1-1552"><a href="#cb1-1552" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1553"><a href="#cb1-1553" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1554"><a href="#cb1-1554" aria-hidden="true" tabindex="-1"></a><span class="fu">use_trackio</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1555"><a href="#cb1-1555" aria-hidden="true" tabindex="-1"></a><span class="co"># Your trackio project name</span></span>
<span id="cb1-1556"><a href="#cb1-1556" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1557"><a href="#cb1-1557" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your trackio run</span></span>
<span id="cb1-1558"><a href="#cb1-1558" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1559"><a href="#cb1-1559" aria-hidden="true" tabindex="-1"></a><span class="co"># Hugging Face Space ID to sync dashboard to (optional, runs locally if not provided)</span></span>
<span id="cb1-1560"><a href="#cb1-1560" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_space_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1561"><a href="#cb1-1561" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1562"><a href="#cb1-1562" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable OpenTelemetry metrics collection and Prometheus export</span></span>
<span id="cb1-1563"><a href="#cb1-1563" aria-hidden="true" tabindex="-1"></a><span class="fu">use_otel_metrics</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
<span id="cb1-1564"><a href="#cb1-1564" aria-hidden="true" tabindex="-1"></a><span class="co"># Host to bind the OpenTelemetry metrics server to</span></span>
<span id="cb1-1565"><a href="#cb1-1565" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_host</span><span class="kw">:</span><span class="at"> str | None = localhost</span></span>
<span id="cb1-1566"><a href="#cb1-1566" aria-hidden="true" tabindex="-1"></a><span class="co"># Port for the Prometheus metrics HTTP server</span></span>
<span id="cb1-1567"><a href="#cb1-1567" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_port</span><span class="kw">:</span><span class="at"> int | None = 8000</span></span>
<span id="cb1-1568"><a href="#cb1-1568" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1569"><a href="#cb1-1569" aria-hidden="true" tabindex="-1"></a><span class="co"># the number of activate layers in LISA</span></span>
<span id="cb1-1570"><a href="#cb1-1570" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_n_layers</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1571"><a href="#cb1-1571" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to switch layers in LISA</span></span>
<span id="cb1-1572"><a href="#cb1-1572" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_step_interval</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1573"><a href="#cb1-1573" aria-hidden="true" tabindex="-1"></a><span class="co"># path under the model to access the layers</span></span>
<span id="cb1-1574"><a href="#cb1-1574" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_layers_attribute</span><span class="kw">:</span><span class="at"> str | None = model.layers</span></span>
<span id="cb1-1575"><a href="#cb1-1575" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1576"><a href="#cb1-1576" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_title</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1577"><a href="#cb1-1577" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_share</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1578"><a href="#cb1-1578" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1579"><a href="#cb1-1579" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_port</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1580"><a href="#cb1-1580" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1581"><a href="#cb1-1581" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_temperature</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1582"><a href="#cb1-1582" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1583"><a href="#cb1-1583" aria-hidden="true" tabindex="-1"></a><span class="fu">use_ray</span><span class="kw">:</span><span class="at"> bool = False</span></span>
<span id="cb1-1584"><a href="#cb1-1584" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1585"><a href="#cb1-1585" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_num_workers</span><span class="kw">:</span><span class="at"> int = 1</span></span>
<span id="cb1-1586"><a href="#cb1-1586" aria-hidden="true" tabindex="-1"></a><span class="fu">resources_per_worker</span><span class="kw">:</span><span class="at"> dict</span></span>
<span id="cb1-1587"><a href="#cb1-1587" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1588"><a href="#cb1-1588" aria-hidden="true" tabindex="-1"></a><span class="co"># The size of the image to resize to. It can be an integer (resized into padded-square</span></span>
<span id="cb1-1589"><a href="#cb1-1589" aria-hidden="true" tabindex="-1"></a><span class="co"># image) or a tuple (width, height).If not provided, we will attempt to load from</span></span>
<span id="cb1-1590"><a href="#cb1-1590" aria-hidden="true" tabindex="-1"></a><span class="co"># preprocessor.size, otherwise, images won't be resized.</span></span>
<span id="cb1-1591"><a href="#cb1-1591" aria-hidden="true" tabindex="-1"></a><span class="fu">image_size</span><span class="kw">:</span><span class="at"> int | tuple[int, int] | None</span></span>
<span id="cb1-1592"><a href="#cb1-1592" aria-hidden="true" tabindex="-1"></a><span class="co"># The resampling algorithm to use for image resizing. Default is bilinear. Please refer</span></span>
<span id="cb1-1593"><a href="#cb1-1593" aria-hidden="true" tabindex="-1"></a><span class="co"># to PIL.Image.Resampling for more details.</span></span>
<span id="cb1-1594"><a href="#cb1-1594" aria-hidden="true" tabindex="-1"></a><span class="fu">image_resize_algorithm</span><span class="kw">:</span><span class="at"> Literal['bilinear', 'bicubic', 'lanczos'] | Resampling | None</span></span>
<span id="cb1-1595"><a href="#cb1-1595" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1596"><a href="#cb1-1596" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides to the base model configuration</span></span>
<span id="cb1-1597"><a href="#cb1-1597" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1598"><a href="#cb1-1598" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides the base model loading from_pretrained</span></span>
<span id="cb1-1599"><a href="#cb1-1599" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1600"><a href="#cb1-1600" aria-hidden="true" tabindex="-1"></a><span class="co"># If you want to specify the type of model to load, AutoModelForCausalLM is a good</span></span>
<span id="cb1-1601"><a href="#cb1-1601" aria-hidden="true" tabindex="-1"></a><span class="co"># choice too</span></span>
<span id="cb1-1602"><a href="#cb1-1602" aria-hidden="true" tabindex="-1"></a><span class="fu">type_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1603"><a href="#cb1-1603" aria-hidden="true" tabindex="-1"></a><span class="co"># You can specify to choose a specific model revision from huggingface hub</span></span>
<span id="cb1-1604"><a href="#cb1-1604" aria-hidden="true" tabindex="-1"></a><span class="fu">revision_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1605"><a href="#cb1-1605" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1606"><a href="#cb1-1606" aria-hidden="true" tabindex="-1"></a><span class="fu">max_packed_sequence_len</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1607"><a href="#cb1-1607" aria-hidden="true" tabindex="-1"></a><span class="fu">rope_scaling</span><span class="kw">:</span><span class="at"> Any | None</span></span>
<span id="cb1-1608"><a href="#cb1-1608" aria-hidden="true" tabindex="-1"></a><span class="fu">noisy_embedding_alpha</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1609"><a href="#cb1-1609" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_beta</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1610"><a href="#cb1-1610" aria-hidden="true" tabindex="-1"></a><span class="fu">evaluation_strategy</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1611"><a href="#cb1-1611" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_table_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1612"><a href="#cb1-1612" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1613"><a href="#cb1-1613" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_use_logits_to_keep</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1614"><a href="#cb1-1614" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_generate_during_eval</span><span class="kw">:</span><span class="at"> bool | None</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<span id="cb1-1410"><a href="#cb1-1410" aria-hidden="true" tabindex="-1"></a><span class="co"># Method to use for LoRA merging. 'memory_efficient' (default) processes shards</span></span>
<span id="cb1-1411"><a href="#cb1-1411" aria-hidden="true" tabindex="-1"></a><span class="co"># individually to reduce memory usage, 'legacy' loads the full model into memory.</span></span>
<span id="cb1-1412"><a href="#cb1-1412" aria-hidden="true" tabindex="-1"></a><span class="fu">merge_method</span><span class="kw">:</span><span class="at"> Literal['legacy', 'memory_efficient'] | None = memory_efficient</span></span>
<span id="cb1-1413"><a href="#cb1-1413" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1414"><a href="#cb1-1414" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to use ReLoRA. Use with jagged_restart_*steps options.</span></span>
<span id="cb1-1415"><a href="#cb1-1415" aria-hidden="true" tabindex="-1"></a><span class="fu">relora</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1416"><a href="#cb1-1416" aria-hidden="true" tabindex="-1"></a><span class="co"># threshold for optimizer magnitude when pruning</span></span>
<span id="cb1-1417"><a href="#cb1-1417" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_prune_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1418"><a href="#cb1-1418" aria-hidden="true" tabindex="-1"></a><span class="co"># True to perform lora weight merges on cpu during restarts, for modest gpu memory</span></span>
<span id="cb1-1419"><a href="#cb1-1419" aria-hidden="true" tabindex="-1"></a><span class="co"># savings</span></span>
<span id="cb1-1420"><a href="#cb1-1420" aria-hidden="true" tabindex="-1"></a><span class="fu">relora_cpu_offload</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1421"><a href="#cb1-1421" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1422"><a href="#cb1-1422" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to reset for jagged restarts</span></span>
<span id="cb1-1423"><a href="#cb1-1423" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1424"><a href="#cb1-1424" aria-hidden="true" tabindex="-1"></a><span class="co"># how many warmup steps to take after reset for jagged restarts</span></span>
<span id="cb1-1425"><a href="#cb1-1425" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_warmup_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1426"><a href="#cb1-1426" aria-hidden="true" tabindex="-1"></a><span class="co"># how many anneal steps to take before reset for jagged restarts</span></span>
<span id="cb1-1427"><a href="#cb1-1427" aria-hidden="true" tabindex="-1"></a><span class="fu">jagged_restart_anneal_steps</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1428"><a href="#cb1-1428" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1429"><a href="#cb1-1429" aria-hidden="true" tabindex="-1"></a><span class="co"># If greater than 1, backpropagation will be skipped and the gradients will be</span></span>
<span id="cb1-1430"><a href="#cb1-1430" aria-hidden="true" tabindex="-1"></a><span class="co"># accumulated for the given number of steps.</span></span>
<span id="cb1-1431"><a href="#cb1-1431" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_accumulation_steps</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1432"><a href="#cb1-1432" aria-hidden="true" tabindex="-1"></a><span class="co"># The number of samples to include in each batch. This is the number of samples sent to</span></span>
<span id="cb1-1433"><a href="#cb1-1433" aria-hidden="true" tabindex="-1"></a><span class="co"># each GPU. Batch size per gpu = micro_batch_size * gradient_accumulation_steps</span></span>
<span id="cb1-1434"><a href="#cb1-1434" aria-hidden="true" tabindex="-1"></a><span class="fu">micro_batch_size</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1435"><a href="#cb1-1435" aria-hidden="true" tabindex="-1"></a><span class="co"># Total batch size, we do not recommended setting this manually</span></span>
<span id="cb1-1436"><a href="#cb1-1436" aria-hidden="true" tabindex="-1"></a><span class="fu">batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1437"><a href="#cb1-1437" aria-hidden="true" tabindex="-1"></a><span class="co"># per gpu micro batch size for evals, defaults to value of micro_batch_size</span></span>
<span id="cb1-1438"><a href="#cb1-1438" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_batch_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1439"><a href="#cb1-1439" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1440"><a href="#cb1-1440" aria-hidden="true" tabindex="-1"></a><span class="co"># whether to find batch size that fits in memory. Passed to underlying transformers</span></span>
<span id="cb1-1441"><a href="#cb1-1441" aria-hidden="true" tabindex="-1"></a><span class="co"># Trainer</span></span>
<span id="cb1-1442"><a href="#cb1-1442" aria-hidden="true" tabindex="-1"></a><span class="fu">auto_find_batch_size</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1443"><a href="#cb1-1443" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1444"><a href="#cb1-1444" aria-hidden="true" tabindex="-1"></a><span class="co"># Whether to mask out or include the human's prompt from the training labels</span></span>
<span id="cb1-1445"><a href="#cb1-1445" aria-hidden="true" tabindex="-1"></a><span class="fu">train_on_inputs</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
<span id="cb1-1446"><a href="#cb1-1446" aria-hidden="true" tabindex="-1"></a><span class="co"># Group similarly sized data to minimize padding. May be slower to start, as it must</span></span>
<span id="cb1-1447"><a href="#cb1-1447" aria-hidden="true" tabindex="-1"></a><span class="co"># download and sort the entire dataset. Note that training loss may have an oscillating</span></span>
<span id="cb1-1448"><a href="#cb1-1448" aria-hidden="true" tabindex="-1"></a><span class="co"># pattern with this enabled.</span></span>
<span id="cb1-1449"><a href="#cb1-1449" aria-hidden="true" tabindex="-1"></a><span class="fu">group_by_length</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1450"><a href="#cb1-1450" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1451"><a href="#cb1-1451" aria-hidden="true" tabindex="-1"></a><span class="fu">learning_rate</span><span class="kw">:</span><span class="at"> str | float (required)</span></span>
<span id="cb1-1452"><a href="#cb1-1452" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1453"><a href="#cb1-1453" aria-hidden="true" tabindex="-1"></a><span class="fu">embedding_lr_scale</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1454"><a href="#cb1-1454" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify weight decay</span></span>
<span id="cb1-1455"><a href="#cb1-1455" aria-hidden="true" tabindex="-1"></a><span class="fu">weight_decay</span><span class="kw">:</span><span class="at"> float | None = 0.0</span></span>
<span id="cb1-1456"><a href="#cb1-1456" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify optimizer</span></span>
<span id="cb1-1457"><a href="#cb1-1457" aria-hidden="true" tabindex="-1"></a><span class="fu">optimizer</span><span class="kw">:</span><span class="at"> OptimizerNames | CustomSupportedOptimizers | None = OptimizerNames.ADAMW_TORCH_FUSED</span></span>
<span id="cb1-1458"><a href="#cb1-1458" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary of arguments to pass to the optimizer</span></span>
<span id="cb1-1459"><a href="#cb1-1459" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_args</span><span class="kw">:</span><span class="at"> str | dict[str, Any] | None</span></span>
<span id="cb1-1460"><a href="#cb1-1460" aria-hidden="true" tabindex="-1"></a><span class="co"># The target modules to optimize, i.e. the module names that you would like to train,</span></span>
<span id="cb1-1461"><a href="#cb1-1461" aria-hidden="true" tabindex="-1"></a><span class="co"># right now this is used only for GaLore algorithm</span></span>
<span id="cb1-1462"><a href="#cb1-1462" aria-hidden="true" tabindex="-1"></a><span class="fu">optim_target_modules</span><span class="kw">:</span><span class="at"> list[str] | Literal['all_linear'] | None</span></span>
<span id="cb1-1463"><a href="#cb1-1463" aria-hidden="true" tabindex="-1"></a><span class="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
<span id="cb1-1464"><a href="#cb1-1464" aria-hidden="true" tabindex="-1"></a><span class="fu">torchdistx_path</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1465"><a href="#cb1-1465" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler</span><span class="kw">:</span><span class="at"> SchedulerType | Literal['one_cycle'] | Literal['rex'] | None = SchedulerType.COSINE</span></span>
<span id="cb1-1466"><a href="#cb1-1466" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify a scheduler and kwargs to use with the optimizer</span></span>
<span id="cb1-1467"><a href="#cb1-1467" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_scheduler_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1468"><a href="#cb1-1468" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_quadratic_warmup</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1469"><a href="#cb1-1469" aria-hidden="true" tabindex="-1"></a><span class="co"># decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of</span></span>
<span id="cb1-1470"><a href="#cb1-1470" aria-hidden="true" tabindex="-1"></a><span class="co"># peak lr</span></span>
<span id="cb1-1471"><a href="#cb1-1471" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_min_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1472"><a href="#cb1-1472" aria-hidden="true" tabindex="-1"></a><span class="co"># freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means</span></span>
<span id="cb1-1473"><a href="#cb1-1473" aria-hidden="true" tabindex="-1"></a><span class="co"># start cosine_min_lr at 80% of training step</span></span>
<span id="cb1-1474"><a href="#cb1-1474" aria-hidden="true" tabindex="-1"></a><span class="fu">cosine_constant_lr_ratio</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1475"><a href="#cb1-1475" aria-hidden="true" tabindex="-1"></a><span class="co"># Learning rate div factor</span></span>
<span id="cb1-1476"><a href="#cb1-1476" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_div_factor</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1477"><a href="#cb1-1477" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1478"><a href="#cb1-1478" aria-hidden="true" tabindex="-1"></a><span class="fu">lr_groups</span><span class="kw">:</span><span class="at"> list[LrGroup] | None</span></span>
<span id="cb1-1479"><a href="#cb1-1479" aria-hidden="true" tabindex="-1"></a><span class="co"> # For LrGroup:</span></span>
<span id="cb1-1480"><a href="#cb1-1480" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> str (required)</span></span>
<span id="cb1-1481"><a href="#cb1-1481" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">modules</span><span class="kw">:</span><span class="at"> list[str] (required)</span></span>
<span id="cb1-1482"><a href="#cb1-1482" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">lr</span><span class="kw">:</span><span class="at"> float (required)</span></span>
<span id="cb1-1483"><a href="#cb1-1483" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1484"><a href="#cb1-1484" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1485"><a href="#cb1-1485" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1486"><a href="#cb1-1486" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
<span id="cb1-1487"><a href="#cb1-1487" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_epsilon2</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1488"><a href="#cb1-1488" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1489"><a href="#cb1-1489" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta1</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1490"><a href="#cb1-1490" aria-hidden="true" tabindex="-1"></a><span class="co"># adamw hyperparams</span></span>
<span id="cb1-1491"><a href="#cb1-1491" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta2</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1492"><a href="#cb1-1492" aria-hidden="true" tabindex="-1"></a><span class="co"># only used for CAME Optimizer</span></span>
<span id="cb1-1493"><a href="#cb1-1493" aria-hidden="true" tabindex="-1"></a><span class="fu">adam_beta3</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1494"><a href="#cb1-1494" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1495"><a href="#cb1-1495" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer learning rate</span></span>
<span id="cb1-1496"><a href="#cb1-1496" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_lr</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1497"><a href="#cb1-1497" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer momentum</span></span>
<span id="cb1-1498"><a href="#cb1-1498" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_momentum</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1499"><a href="#cb1-1499" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: r/d fraction for low-rank approximation. Used to compute the low-rank</span></span>
<span id="cb1-1500"><a href="#cb1-1500" aria-hidden="true" tabindex="-1"></a><span class="co"># dimension.</span></span>
<span id="cb1-1501"><a href="#cb1-1501" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_fraction</span><span class="kw">:</span><span class="at"> float | None = 1.0</span></span>
<span id="cb1-1502"><a href="#cb1-1502" aria-hidden="true" tabindex="-1"></a><span class="co"># Dion Optimizer: Round up the low-rank dimension to a multiple of this number. This may</span></span>
<span id="cb1-1503"><a href="#cb1-1503" aria-hidden="true" tabindex="-1"></a><span class="co"># be useful to ensure even sharding.</span></span>
<span id="cb1-1504"><a href="#cb1-1504" aria-hidden="true" tabindex="-1"></a><span class="fu">dion_rank_multiple_of</span><span class="kw">:</span><span class="at"> int | None = 1</span></span>
<span id="cb1-1505"><a href="#cb1-1505" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1506"><a href="#cb1-1506" aria-hidden="true" tabindex="-1"></a><span class="co"># Gradient clipping max norm</span></span>
<span id="cb1-1507"><a href="#cb1-1507" aria-hidden="true" tabindex="-1"></a><span class="fu">max_grad_norm</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1508"><a href="#cb1-1508" aria-hidden="true" tabindex="-1"></a><span class="fu">num_epochs</span><span class="kw">:</span><span class="at"> float = 1.0</span></span>
<span id="cb1-1509"><a href="#cb1-1509" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1510"><a href="#cb1-1510" aria-hidden="true" tabindex="-1"></a><span class="fu">use_wandb</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1511"><a href="#cb1-1511" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your wandb run</span></span>
<span id="cb1-1512"><a href="#cb1-1512" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1513"><a href="#cb1-1513" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the ID of your wandb run</span></span>
<span id="cb1-1514"><a href="#cb1-1514" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_run_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1515"><a href="#cb1-1515" aria-hidden="true" tabindex="-1"></a><span class="co"># "offline" to save run metadata locally and not sync to the server, "disabled" to turn</span></span>
<span id="cb1-1516"><a href="#cb1-1516" aria-hidden="true" tabindex="-1"></a><span class="co"># off wandb</span></span>
<span id="cb1-1517"><a href="#cb1-1517" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1518"><a href="#cb1-1518" aria-hidden="true" tabindex="-1"></a><span class="co"># Your wandb project name</span></span>
<span id="cb1-1519"><a href="#cb1-1519" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_project</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1520"><a href="#cb1-1520" aria-hidden="true" tabindex="-1"></a><span class="co"># A wandb Team name if using a Team</span></span>
<span id="cb1-1521"><a href="#cb1-1521" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_entity</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1522"><a href="#cb1-1522" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_watch</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1523"><a href="#cb1-1523" aria-hidden="true" tabindex="-1"></a><span class="co"># "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only</span></span>
<span id="cb1-1524"><a href="#cb1-1524" aria-hidden="true" tabindex="-1"></a><span class="co"># at the end of training</span></span>
<span id="cb1-1525"><a href="#cb1-1525" aria-hidden="true" tabindex="-1"></a><span class="fu">wandb_log_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1526"><a href="#cb1-1526" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1527"><a href="#cb1-1527" aria-hidden="true" tabindex="-1"></a><span class="fu">use_mlflow</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1528"><a href="#cb1-1528" aria-hidden="true" tabindex="-1"></a><span class="co"># URI to mlflow</span></span>
<span id="cb1-1529"><a href="#cb1-1529" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_tracking_uri</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1530"><a href="#cb1-1530" aria-hidden="true" tabindex="-1"></a><span class="co"># Your experiment name</span></span>
<span id="cb1-1531"><a href="#cb1-1531" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_experiment_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1532"><a href="#cb1-1532" aria-hidden="true" tabindex="-1"></a><span class="co"># Your run name</span></span>
<span id="cb1-1533"><a href="#cb1-1533" aria-hidden="true" tabindex="-1"></a><span class="fu">mlflow_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1534"><a href="#cb1-1534" aria-hidden="true" tabindex="-1"></a><span class="co"># set to true to copy each saved checkpoint on each save to mlflow artifact registry</span></span>
<span id="cb1-1535"><a href="#cb1-1535" aria-hidden="true" tabindex="-1"></a><span class="fu">hf_mlflow_log_artifacts</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1536"><a href="#cb1-1536" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1537"><a href="#cb1-1537" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable or disable Comet integration.</span></span>
<span id="cb1-1538"><a href="#cb1-1538" aria-hidden="true" tabindex="-1"></a><span class="fu">use_comet</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1539"><a href="#cb1-1539" aria-hidden="true" tabindex="-1"></a><span class="co"># API key for Comet. Recommended to set via `comet login`.</span></span>
<span id="cb1-1540"><a href="#cb1-1540" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_api_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1541"><a href="#cb1-1541" aria-hidden="true" tabindex="-1"></a><span class="co"># Workspace name in Comet. Defaults to the user's default workspace.</span></span>
<span id="cb1-1542"><a href="#cb1-1542" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_workspace</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1543"><a href="#cb1-1543" aria-hidden="true" tabindex="-1"></a><span class="co"># Project name in Comet. Defaults to Uncategorized.</span></span>
<span id="cb1-1544"><a href="#cb1-1544" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1545"><a href="#cb1-1545" aria-hidden="true" tabindex="-1"></a><span class="co"># Identifier for the experiment. Used to append data to an existing experiment or</span></span>
<span id="cb1-1546"><a href="#cb1-1546" aria-hidden="true" tabindex="-1"></a><span class="co"># control the key of new experiments. Default to a random key.</span></span>
<span id="cb1-1547"><a href="#cb1-1547" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_key</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1548"><a href="#cb1-1548" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a new experiment ("create") or log to an existing one ("get"). Default</span></span>
<span id="cb1-1549"><a href="#cb1-1549" aria-hidden="true" tabindex="-1"></a><span class="co"># ("get_or_create") auto-selects based on configuration.</span></span>
<span id="cb1-1550"><a href="#cb1-1550" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_mode</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1551"><a href="#cb1-1551" aria-hidden="true" tabindex="-1"></a><span class="co"># Set to True to log data to Comet server, or False for offline storage. Default is</span></span>
<span id="cb1-1552"><a href="#cb1-1552" aria-hidden="true" tabindex="-1"></a><span class="co"># True.</span></span>
<span id="cb1-1553"><a href="#cb1-1553" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_online</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1554"><a href="#cb1-1554" aria-hidden="true" tabindex="-1"></a><span class="co"># Dictionary for additional configuration settings, see the doc for more details.</span></span>
<span id="cb1-1555"><a href="#cb1-1555" aria-hidden="true" tabindex="-1"></a><span class="fu">comet_experiment_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1556"><a href="#cb1-1556" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1557"><a href="#cb1-1557" aria-hidden="true" tabindex="-1"></a><span class="fu">use_trackio</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1558"><a href="#cb1-1558" aria-hidden="true" tabindex="-1"></a><span class="co"># Your trackio project name</span></span>
<span id="cb1-1559"><a href="#cb1-1559" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_project_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1560"><a href="#cb1-1560" aria-hidden="true" tabindex="-1"></a><span class="co"># Set the name of your trackio run</span></span>
<span id="cb1-1561"><a href="#cb1-1561" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1562"><a href="#cb1-1562" aria-hidden="true" tabindex="-1"></a><span class="co"># Hugging Face Space ID to sync dashboard to (optional, runs locally if not provided)</span></span>
<span id="cb1-1563"><a href="#cb1-1563" aria-hidden="true" tabindex="-1"></a><span class="fu">trackio_space_id</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1564"><a href="#cb1-1564" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1565"><a href="#cb1-1565" aria-hidden="true" tabindex="-1"></a><span class="co"># Enable OpenTelemetry metrics collection and Prometheus export</span></span>
<span id="cb1-1566"><a href="#cb1-1566" aria-hidden="true" tabindex="-1"></a><span class="fu">use_otel_metrics</span><span class="kw">:</span><span class="at"> bool | None = False</span></span>
<span id="cb1-1567"><a href="#cb1-1567" aria-hidden="true" tabindex="-1"></a><span class="co"># Host to bind the OpenTelemetry metrics server to</span></span>
<span id="cb1-1568"><a href="#cb1-1568" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_host</span><span class="kw">:</span><span class="at"> str | None = localhost</span></span>
<span id="cb1-1569"><a href="#cb1-1569" aria-hidden="true" tabindex="-1"></a><span class="co"># Port for the Prometheus metrics HTTP server</span></span>
<span id="cb1-1570"><a href="#cb1-1570" aria-hidden="true" tabindex="-1"></a><span class="fu">otel_metrics_port</span><span class="kw">:</span><span class="at"> int | None = 8000</span></span>
<span id="cb1-1571"><a href="#cb1-1571" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1572"><a href="#cb1-1572" aria-hidden="true" tabindex="-1"></a><span class="co"># the number of activate layers in LISA</span></span>
<span id="cb1-1573"><a href="#cb1-1573" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_n_layers</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1574"><a href="#cb1-1574" aria-hidden="true" tabindex="-1"></a><span class="co"># how often to switch layers in LISA</span></span>
<span id="cb1-1575"><a href="#cb1-1575" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_step_interval</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1576"><a href="#cb1-1576" aria-hidden="true" tabindex="-1"></a><span class="co"># path under the model to access the layers</span></span>
<span id="cb1-1577"><a href="#cb1-1577" aria-hidden="true" tabindex="-1"></a><span class="fu">lisa_layers_attribute</span><span class="kw">:</span><span class="at"> str | None = model.layers</span></span>
<span id="cb1-1578"><a href="#cb1-1578" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1579"><a href="#cb1-1579" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_title</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1580"><a href="#cb1-1580" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_share</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1581"><a href="#cb1-1581" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1582"><a href="#cb1-1582" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_server_port</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1583"><a href="#cb1-1583" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1584"><a href="#cb1-1584" aria-hidden="true" tabindex="-1"></a><span class="fu">gradio_temperature</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1585"><a href="#cb1-1585" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1586"><a href="#cb1-1586" aria-hidden="true" tabindex="-1"></a><span class="fu">use_ray</span><span class="kw">:</span><span class="at"> bool = False</span></span>
<span id="cb1-1587"><a href="#cb1-1587" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_run_name</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1588"><a href="#cb1-1588" aria-hidden="true" tabindex="-1"></a><span class="fu">ray_num_workers</span><span class="kw">:</span><span class="at"> int = 1</span></span>
<span id="cb1-1589"><a href="#cb1-1589" aria-hidden="true" tabindex="-1"></a><span class="fu">resources_per_worker</span><span class="kw">:</span><span class="at"> dict</span></span>
<span id="cb1-1590"><a href="#cb1-1590" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1591"><a href="#cb1-1591" aria-hidden="true" tabindex="-1"></a><span class="co"># The size of the image to resize to. It can be an integer (resized into padded-square</span></span>
<span id="cb1-1592"><a href="#cb1-1592" aria-hidden="true" tabindex="-1"></a><span class="co"># image) or a tuple (width, height).If not provided, we will attempt to load from</span></span>
<span id="cb1-1593"><a href="#cb1-1593" aria-hidden="true" tabindex="-1"></a><span class="co"># preprocessor.size, otherwise, images won't be resized.</span></span>
<span id="cb1-1594"><a href="#cb1-1594" aria-hidden="true" tabindex="-1"></a><span class="fu">image_size</span><span class="kw">:</span><span class="at"> int | tuple[int, int] | None</span></span>
<span id="cb1-1595"><a href="#cb1-1595" aria-hidden="true" tabindex="-1"></a><span class="co"># The resampling algorithm to use for image resizing. Default is bilinear. Please refer</span></span>
<span id="cb1-1596"><a href="#cb1-1596" aria-hidden="true" tabindex="-1"></a><span class="co"># to PIL.Image.Resampling for more details.</span></span>
<span id="cb1-1597"><a href="#cb1-1597" aria-hidden="true" tabindex="-1"></a><span class="fu">image_resize_algorithm</span><span class="kw">:</span><span class="at"> Literal['bilinear', 'bicubic', 'lanczos'] | Resampling | None</span></span>
<span id="cb1-1598"><a href="#cb1-1598" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1599"><a href="#cb1-1599" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides to the base model configuration</span></span>
<span id="cb1-1600"><a href="#cb1-1600" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_config</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1601"><a href="#cb1-1601" aria-hidden="true" tabindex="-1"></a><span class="co"># optional overrides the base model loading from_pretrained</span></span>
<span id="cb1-1602"><a href="#cb1-1602" aria-hidden="true" tabindex="-1"></a><span class="fu">overrides_of_model_kwargs</span><span class="kw">:</span><span class="at"> dict[str, Any] | None</span></span>
<span id="cb1-1603"><a href="#cb1-1603" aria-hidden="true" tabindex="-1"></a><span class="co"># If you want to specify the type of model to load, AutoModelForCausalLM is a good</span></span>
<span id="cb1-1604"><a href="#cb1-1604" aria-hidden="true" tabindex="-1"></a><span class="co"># choice too</span></span>
<span id="cb1-1605"><a href="#cb1-1605" aria-hidden="true" tabindex="-1"></a><span class="fu">type_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1606"><a href="#cb1-1606" aria-hidden="true" tabindex="-1"></a><span class="co"># You can specify to choose a specific model revision from huggingface hub</span></span>
<span id="cb1-1607"><a href="#cb1-1607" aria-hidden="true" tabindex="-1"></a><span class="fu">revision_of_model</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1608"><a href="#cb1-1608" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-1609"><a href="#cb1-1609" aria-hidden="true" tabindex="-1"></a><span class="fu">max_packed_sequence_len</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1610"><a href="#cb1-1610" aria-hidden="true" tabindex="-1"></a><span class="fu">rope_scaling</span><span class="kw">:</span><span class="at"> Any | None</span></span>
<span id="cb1-1611"><a href="#cb1-1611" aria-hidden="true" tabindex="-1"></a><span class="fu">noisy_embedding_alpha</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1612"><a href="#cb1-1612" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_beta</span><span class="kw">:</span><span class="at"> float | None</span></span>
<span id="cb1-1613"><a href="#cb1-1613" aria-hidden="true" tabindex="-1"></a><span class="fu">evaluation_strategy</span><span class="kw">:</span><span class="at"> str | None</span></span>
<span id="cb1-1614"><a href="#cb1-1614" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_table_size</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1615"><a href="#cb1-1615" aria-hidden="true" tabindex="-1"></a><span class="fu">eval_max_new_tokens</span><span class="kw">:</span><span class="at"> int | None</span></span>
<span id="cb1-1616"><a href="#cb1-1616" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_use_logits_to_keep</span><span class="kw">:</span><span class="at"> bool | None</span></span>
<span id="cb1-1617"><a href="#cb1-1617" aria-hidden="true" tabindex="-1"></a><span class="fu">dpo_generate_during_eval</span><span class="kw">:</span><span class="at"> bool | None</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>

File diff suppressed because it is too large Load Diff

View File

@@ -813,6 +813,20 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<li><a href="#sequence-parallelism" id="toc-sequence-parallelism" class="nav-link" data-scroll-target="#sequence-parallelism">Sequence Parallelism</a></li>
</ul></li>
<li><a href="#simpo" id="toc-simpo" class="nav-link" data-scroll-target="#simpo">SimPO</a></li>
<li><a href="#ebft" id="toc-ebft" class="nav-link" data-scroll-target="#ebft">EBFT</a>
<ul class="collapse">
<li><a href="#structured-mode" id="toc-structured-mode" class="nav-link" data-scroll-target="#structured-mode">Structured Mode</a></li>
<li><a href="#strided-mode" id="toc-strided-mode" class="nav-link" data-scroll-target="#strided-mode">Strided Mode</a></li>
<li><a href="#ebft-configuration-reference" id="toc-ebft-configuration-reference" class="nav-link" data-scroll-target="#ebft-configuration-reference">EBFT Configuration Reference</a></li>
</ul></li>
<li><a href="#nemo-gym-integration" id="toc-nemo-gym-integration" class="nav-link" data-scroll-target="#nemo-gym-integration">NeMo Gym Integration</a>
<ul class="collapse">
<li><a href="#single-turn-simplest" id="toc-single-turn-simplest" class="nav-link" data-scroll-target="#single-turn-simplest">Single-Turn (Simplest)</a></li>
<li><a href="#multi-turn-with-async-grpo-recommended" id="toc-multi-turn-with-async-grpo-recommended" class="nav-link" data-scroll-target="#multi-turn-with-async-grpo-recommended">Multi-Turn with Async GRPO (Recommended)</a></li>
<li><a href="#nemo-gym-prerequisites" id="toc-nemo-gym-prerequisites" class="nav-link" data-scroll-target="#nemo-gym-prerequisites">NeMo Gym Prerequisites</a></li>
<li><a href="#nemo-gym-configuration-reference" id="toc-nemo-gym-configuration-reference" class="nav-link" data-scroll-target="#nemo-gym-configuration-reference">NeMo Gym Configuration Reference</a></li>
<li><a href="#reward-functions-2" id="toc-reward-functions-2" class="nav-link" data-scroll-target="#reward-functions-2">Reward Functions</a></li>
</ul></li>
<li><a href="#using-local-dataset-files" id="toc-using-local-dataset-files" class="nav-link" data-scroll-target="#using-local-dataset-files">Using local dataset files</a></li>
<li><a href="#trl-auto-unwrapping-for-peft" id="toc-trl-auto-unwrapping-for-peft" class="nav-link" data-scroll-target="#trl-auto-unwrapping-for-peft">TRL auto-unwrapping for PEFT</a></li>
</ul></li>
@@ -857,6 +871,8 @@ feedback. Various methods include, but not limited to:</p>
<li><a href="#orpo">Odds Ratio Preference Optimization (ORPO)</a></li>
<li><a href="#grpo">Group Relative Policy Optimization (GRPO)</a></li>
<li><a href="#gdpo">Group Reward-Decoupled Policy Optimization (GDPO)</a></li>
<li><a href="#ebft">Energy-Based Fine-Tuning (EBFT)</a></li>
<li><a href="#nemo-gym-integration">NeMo Gym Integration</a></li>
</ul>
</section>
<section id="rlhf-using-axolotl" class="level2">
@@ -1805,20 +1821,451 @@ Tip
<span id="cb64-4"><a href="#cb64-4" aria-hidden="true" tabindex="-1"></a><span class="fu">simpo_gamma</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.5</span><span class="co"> # default in CPOTrainer</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>This method uses the same dataset format as <a href="#dpo">DPO</a>.</p>
</section>
<section id="ebft" class="level3">
<h3 class="anchored" data-anchor-id="ebft">EBFT</h3>
<p>EBFT (Energy-Based Fine-Tuning) fine-tunes language models by optimizing a <strong>feature-matching loss</strong> rather than relying on external reward functions. A frozen copy of the model extracts embeddings from both generated and ground-truth completions, and the generator is updated via REINFORCE to match the ground-truth feature moments.</p>
<p>Paper: <a href="https://arxiv.org/abs/2603.12248">“Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models”</a> (Jelassi et al., 2026)</p>
<p><strong>Key advantages:</strong></p>
<ul>
<li>No reward model or verifier required — works on any (prompt, completion) data</li>
<li>Applicable to non-verifiable tasks (code, translation, creative writing)</li>
<li>Operates on model rollouts (not teacher forcing), reducing distribution shift</li>
</ul>
<p>EBFT supports two modes:</p>
<ul>
<li><strong>Structured mode</strong>: For QA/instruction data with prompt + completion pairs. Uses vLLM for generation (like GRPO).</li>
<li><strong>Strided mode</strong>: For unstructured text without prompt/completion splits. Uses strided block-parallel generation with flex_attention — no vLLM needed.</li>
</ul>
<section id="structured-mode" class="level4">
<h4 class="anchored" data-anchor-id="structured-mode">Structured Mode</h4>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb65"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb65-1"><a href="#cb65-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-4B</span></span>
<span id="cb65-2"><a href="#cb65-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-3"><a href="#cb65-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> ebft</span></span>
<span id="cb65-4"><a href="#cb65-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-5"><a href="#cb65-5" aria-hidden="true" tabindex="-1"></a><span class="fu">ebft</span><span class="kw">:</span></span>
<span id="cb65-6"><a href="#cb65-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">feature_layers</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="fl">0.25</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.5</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.75</span><span class="kw">]</span><span class="co"> # Extract features at 25%, 50%, 75% depth</span></span>
<span id="cb65-7"><a href="#cb65-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">embed_method</span><span class="kw">:</span><span class="at"> last_token</span></span>
<span id="cb65-8"><a href="#cb65-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_whitening</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
<span id="cb65-9"><a href="#cb65-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">alignment_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span><span class="co"> # Cosine similarity reward weight</span></span>
<span id="cb65-10"><a href="#cb65-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">diversity_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span><span class="co"> # Pairwise dot product penalty</span></span>
<span id="cb65-11"><a href="#cb65-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ce_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.0</span><span class="co"> # Cross-entropy on GT tokens (0 = off)</span></span>
<span id="cb65-12"><a href="#cb65-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-13"><a href="#cb65-13" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
<span id="cb65-14"><a href="#cb65-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
<span id="cb65-15"><a href="#cb65-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">256</span></span>
<span id="cb65-16"><a href="#cb65-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.7</span></span>
<span id="cb65-17"><a href="#cb65-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb65-18"><a href="#cb65-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_host</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.0.0.0</span></span>
<span id="cb65-19"><a href="#cb65-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">8000</span></span>
<span id="cb65-20"><a href="#cb65-20" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_lora_sync</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # LoRA adapter sync (recommended)</span></span>
<span id="cb65-21"><a href="#cb65-21" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_sync_interval</span><span class="kw">:</span><span class="at"> </span><span class="dv">3</span></span>
<span id="cb65-22"><a href="#cb65-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_data_producer</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb65-23"><a href="#cb65-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">async_prefetch</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Set false for sync mode</span></span>
<span id="cb65-24"><a href="#cb65-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">scale_rewards</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb65-25"><a href="#cb65-25" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">loss_type</span><span class="kw">:</span><span class="at"> grpo</span></span>
<span id="cb65-26"><a href="#cb65-26" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">epsilon</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.2</span></span>
<span id="cb65-27"><a href="#cb65-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-28"><a href="#cb65-28" aria-hidden="true" tabindex="-1"></a><span class="fu">vllm</span><span class="kw">:</span></span>
<span id="cb65-29"><a href="#cb65-29" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">gpu_memory_utilization</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.5</span></span>
<span id="cb65-30"><a href="#cb65-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_model_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">2048</span></span>
<span id="cb65-31"><a href="#cb65-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-32"><a href="#cb65-32" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb65-33"><a href="#cb65-33" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> nvidia/OpenCodeInstruct</span></span>
<span id="cb65-34"><a href="#cb65-34" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> ebft_opencode.transform</span></span>
<span id="cb65-35"><a href="#cb65-35" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train[:500]</span></span>
<span id="cb65-36"><a href="#cb65-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb65-37"><a href="#cb65-37" aria-hidden="true" tabindex="-1"></a><span class="fu">adapter</span><span class="kw">:</span><span class="at"> lora</span></span>
<span id="cb65-38"><a href="#cb65-38" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_r</span><span class="kw">:</span><span class="at"> </span><span class="dv">16</span></span>
<span id="cb65-39"><a href="#cb65-39" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_alpha</span><span class="kw">:</span><span class="at"> </span><span class="dv">32</span></span>
<span id="cb65-40"><a href="#cb65-40" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_linear</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb66"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb66-1"><a href="#cb66-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: Start vLLM</span></span>
<span id="cb66-2"><a href="#cb66-2" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> vllm-serve config.yaml</span>
<span id="cb66-3"><a href="#cb66-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb66-4"><a href="#cb66-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: Train</span></span>
<span id="cb66-5"><a href="#cb66-5" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>1 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
</section>
<section id="strided-mode" class="level4">
<h4 class="anchored" data-anchor-id="strided-mode">Strided Mode</h4>
<p>For unstructured text (raw code, prose). No vLLM needed — runs on a single GPU.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb67"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb67-1"><a href="#cb67-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> meta-llama/Llama-3.2-1B</span></span>
<span id="cb67-2"><a href="#cb67-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb67-3"><a href="#cb67-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> ebft</span></span>
<span id="cb67-4"><a href="#cb67-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb67-5"><a href="#cb67-5" aria-hidden="true" tabindex="-1"></a><span class="fu">ebft</span><span class="kw">:</span></span>
<span id="cb67-6"><a href="#cb67-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">mode</span><span class="kw">:</span><span class="at"> strided</span></span>
<span id="cb67-7"><a href="#cb67-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
<span id="cb67-8"><a href="#cb67-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">context_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
<span id="cb67-9"><a href="#cb67-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">generate_max_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">8</span></span>
<span id="cb67-10"><a href="#cb67-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">n_samples_per_prompt</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
<span id="cb67-11"><a href="#cb67-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.6</span></span>
<span id="cb67-12"><a href="#cb67-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">feature_layers</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="fl">0.25</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.5</span><span class="kw">,</span><span class="at"> </span><span class="fl">0.75</span><span class="kw">]</span></span>
<span id="cb67-13"><a href="#cb67-13" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">embed_method</span><span class="kw">:</span><span class="at"> last_token</span></span>
<span id="cb67-14"><a href="#cb67-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_whitening</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb67-15"><a href="#cb67-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">alignment_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
<span id="cb67-16"><a href="#cb67-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">diversity_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
<span id="cb67-17"><a href="#cb67-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">rl_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.0</span></span>
<span id="cb67-18"><a href="#cb67-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ce_coef</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.03</span></span>
<span id="cb67-19"><a href="#cb67-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">advantage_estimator</span><span class="kw">:</span><span class="at"> rloo</span></span>
<span id="cb67-20"><a href="#cb67-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb67-21"><a href="#cb67-21" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb67-22"><a href="#cb67-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> nvidia/OpenCodeInstruct</span></span>
<span id="cb67-23"><a href="#cb67-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> ebft_strided_structured.transform</span></span>
<span id="cb67-24"><a href="#cb67-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train[:1%]</span></span>
<span id="cb67-25"><a href="#cb67-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb67-26"><a href="#cb67-26" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
<span id="cb67-27"><a href="#cb67-27" aria-hidden="true" tabindex="-1"></a><span class="fu">flex_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Strided mode uses flex_attention</span></span>
<span id="cb67-28"><a href="#cb67-28" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb67-29"><a href="#cb67-29" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing_kwargs</span><span class="kw">:</span></span>
<span id="cb67-30"><a href="#cb67-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_reentrant</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required for flex_attention</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb68"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb68-1"><a href="#cb68-1" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>See <code>examples/ebft/</code> for complete example configs covering Llama 1B/3B/8B and Qwen3 4B/8B models in both modes.</p>
</div>
</div>
</section>
<section id="ebft-configuration-reference" class="level4">
<h4 class="anchored" data-anchor-id="ebft-configuration-reference">EBFT Configuration Reference</h4>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 27%">
<col style="width: 39%">
</colgroup>
<thead>
<tr class="header">
<th>Parameter</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>ebft.feature_layers</code></td>
<td><code>[0.25, 0.5, 0.75]</code></td>
<td>Layer depths for feature extraction (fractional)</td>
</tr>
<tr class="even">
<td><code>ebft.embed_method</code></td>
<td><code>last_token</code></td>
<td>Feature pooling: <code>last_token</code>, <code>mean_pooling</code>, <code>concat</code></td>
</tr>
<tr class="odd">
<td><code>ebft.use_whitening</code></td>
<td><code>false</code></td>
<td>SVD whitening of feature dimensions</td>
</tr>
<tr class="even">
<td><code>ebft.alignment_coef</code></td>
<td><code>1.0</code></td>
<td>Cosine similarity reward weight</td>
</tr>
<tr class="odd">
<td><code>ebft.diversity_coef</code></td>
<td><code>1.0</code></td>
<td>Pairwise dot product penalty weight</td>
</tr>
<tr class="even">
<td><code>ebft.ce_coef</code></td>
<td><code>0.0</code></td>
<td>Cross-entropy loss on ground-truth tokens</td>
</tr>
<tr class="odd">
<td><code>ebft.mode</code></td>
<td><code>structured</code></td>
<td><code>structured</code> (vLLM) or <code>strided</code> (no vLLM)</td>
</tr>
<tr class="even">
<td><code>ebft.stride</code></td>
<td></td>
<td>Tokens between anchor points (strided mode)</td>
</tr>
<tr class="odd">
<td><code>ebft.context_length</code></td>
<td></td>
<td>Context window per block (strided mode)</td>
</tr>
<tr class="even">
<td><code>ebft.generate_max_len</code></td>
<td></td>
<td>Tokens to generate per block (strided mode)</td>
</tr>
<tr class="odd">
<td><code>ebft.n_samples_per_prompt</code></td>
<td></td>
<td>Rollouts per document (strided mode)</td>
</tr>
<tr class="even">
<td><code>ebft.advantage_estimator</code></td>
<td><code>grpo</code></td>
<td><code>grpo</code> or <code>rloo</code> (strided mode)</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="nemo-gym-integration" class="level3">
<h3 class="anchored" data-anchor-id="nemo-gym-integration">NeMo Gym Integration</h3>
<p><a href="https://github.com/NVIDIA-NeMo/Gym">NeMo Gym</a> provides 50+ verified RL environments (math, coding, tool-use, reasoning) with deterministic reward signals. The axolotl integration supports both <strong>single-turn</strong> (call <code>/verify</code> after generation) and <strong>multi-turn</strong> (agent-based tool execution via <code>/run</code>).</p>
<section id="single-turn-simplest" class="level4">
<h4 class="anchored" data-anchor-id="single-turn-simplest">Single-Turn (Simplest)</h4>
<p>For environments that only need answer verification (math, coding challenges). No agent server needed — the reward function calls <code>/verify</code> directly on the resource server.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb69"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb69-1"><a href="#cb69-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2.5-0.5B-Instruct</span></span>
<span id="cb69-2"><a href="#cb69-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb69-3"><a href="#cb69-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> grpo</span></span>
<span id="cb69-4"><a href="#cb69-4" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> tokenizer_default</span></span>
<span id="cb69-5"><a href="#cb69-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb69-6"><a href="#cb69-6" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
<span id="cb69-7"><a href="#cb69-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span><span class="co"> # Colocate mode (single GPU)</span></span>
<span id="cb69-8"><a href="#cb69-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
<span id="cb69-9"><a href="#cb69-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">128</span></span>
<span id="cb69-10"><a href="#cb69-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.9</span></span>
<span id="cb69-11"><a href="#cb69-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reward_funcs</span><span class="kw">:</span></span>
<span id="cb69-12"><a href="#cb69-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify</span></span>
<span id="cb69-13"><a href="#cb69-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb69-14"><a href="#cb69-14" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
<span id="cb69-15"><a href="#cb69-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.NemoGymPlugin</span></span>
<span id="cb69-16"><a href="#cb69-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb69-17"><a href="#cb69-17" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_enabled</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb69-18"><a href="#cb69-18" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_dir</span><span class="kw">:</span><span class="at"> ~/Gym</span></span>
<span id="cb69-19"><a href="#cb69-19" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_auto_start</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
<span id="cb69-20"><a href="#cb69-20" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_head_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">11000</span></span>
<span id="cb69-21"><a href="#cb69-21" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_datasets</span><span class="kw">:</span></span>
<span id="cb69-22"><a href="#cb69-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> resources_servers/reasoning_gym/data/train_basic_arithmetic.jsonl</span></span>
<span id="cb69-23"><a href="#cb69-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">server_name</span><span class="kw">:</span><span class="at"> reasoning_gym</span></span>
<span id="cb69-24"><a href="#cb69-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb69-25"><a href="#cb69-25" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb69-26"><a href="#cb69-26" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/Gym/resources_servers/reasoning_gym/data/train_basic_arithmetic.jsonl</span></span>
<span id="cb69-27"><a href="#cb69-27" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chat_template</span></span>
<span id="cb69-28"><a href="#cb69-28" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">field_messages</span><span class="kw">:</span><span class="at"> responses_create_params.input</span></span>
<span id="cb69-29"><a href="#cb69-29" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_content</span><span class="kw">:</span><span class="at"> content</span></span>
<span id="cb69-30"><a href="#cb69-30" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_role</span><span class="kw">:</span><span class="at"> role</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb70"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb70-1"><a href="#cb70-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: Start NeMo Gym resource server</span></span>
<span id="cb70-2"><a href="#cb70-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym <span class="kw">&amp;&amp;</span> <span class="ex">.venv/bin/ng_run</span> <span class="dt">\</span></span>
<span id="cb70-3"><a href="#cb70-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"+config_paths=[resources_servers/reasoning_gym/configs/resources_only.yaml]"</span> <span class="dt">\</span></span>
<span id="cb70-4"><a href="#cb70-4" aria-hidden="true" tabindex="-1"></a> <span class="st">"+skip_venv_if_present=true"</span></span>
<span id="cb70-5"><a href="#cb70-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb70-6"><a href="#cb70-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: Train</span></span>
<span id="cb70-7"><a href="#cb70-7" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><code>nemo_gym_datasets.path</code> is relative to <code>nemo_gym_dir</code>. Dont use absolute paths or they will be double-joined.</p>
</div>
</div>
</section>
<section id="multi-turn-with-async-grpo-recommended" class="level4">
<h4 class="anchored" data-anchor-id="multi-turn-with-async-grpo-recommended">Multi-Turn with Async GRPO (Recommended)</h4>
<p>For environments with tool-use (weather, search, databases). An agent server orchestrates multi-turn interactions: generate → parse tool calls → execute tools → feed results back → repeat until done.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb71"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb71-1"><a href="#cb71-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-0.6B</span></span>
<span id="cb71-2"><a href="#cb71-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-3"><a href="#cb71-3" aria-hidden="true" tabindex="-1"></a><span class="fu">rl</span><span class="kw">:</span><span class="at"> grpo</span></span>
<span id="cb71-4"><a href="#cb71-4" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> tokenizer_default</span></span>
<span id="cb71-5"><a href="#cb71-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-6"><a href="#cb71-6" aria-hidden="true" tabindex="-1"></a><span class="fu">adapter</span><span class="kw">:</span><span class="at"> lora</span></span>
<span id="cb71-7"><a href="#cb71-7" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_r</span><span class="kw">:</span><span class="at"> </span><span class="dv">16</span></span>
<span id="cb71-8"><a href="#cb71-8" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_alpha</span><span class="kw">:</span><span class="at"> </span><span class="dv">32</span></span>
<span id="cb71-9"><a href="#cb71-9" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_modules</span><span class="kw">:</span><span class="at"> </span><span class="kw">[</span><span class="at">q_proj</span><span class="kw">,</span><span class="at"> k_proj</span><span class="kw">,</span><span class="at"> v_proj</span><span class="kw">,</span><span class="at"> o_proj</span><span class="kw">,</span><span class="at"> gate_proj</span><span class="kw">,</span><span class="at"> up_proj</span><span class="kw">,</span><span class="at"> down_proj</span><span class="kw">]</span></span>
<span id="cb71-10"><a href="#cb71-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-11"><a href="#cb71-11" aria-hidden="true" tabindex="-1"></a><span class="fu">trl</span><span class="kw">:</span></span>
<span id="cb71-12"><a href="#cb71-12" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_vllm</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb71-13"><a href="#cb71-13" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_mode</span><span class="kw">:</span><span class="at"> server</span></span>
<span id="cb71-14"><a href="#cb71-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_host</span><span class="kw">:</span><span class="at"> localhost</span></span>
<span id="cb71-15"><a href="#cb71-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_server_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">8000</span></span>
<span id="cb71-16"><a href="#cb71-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_lora_sync</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb71-17"><a href="#cb71-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">vllm_sync_interval</span><span class="kw">:</span><span class="at"> </span><span class="dv">5</span></span>
<span id="cb71-18"><a href="#cb71-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_data_producer</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb71-19"><a href="#cb71-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">async_prefetch</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # 3x speedup</span></span>
<span id="cb71-20"><a href="#cb71-20" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">num_generations</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span></span>
<span id="cb71-21"><a href="#cb71-21" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_completion_length</span><span class="kw">:</span><span class="at"> </span><span class="dv">512</span></span>
<span id="cb71-22"><a href="#cb71-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">temperature</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.8</span></span>
<span id="cb71-23"><a href="#cb71-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reward_funcs</span><span class="kw">:</span></span>
<span id="cb71-24"><a href="#cb71-24" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.rewards.reward_env</span></span>
<span id="cb71-25"><a href="#cb71-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-26"><a href="#cb71-26" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
<span id="cb71-27"><a href="#cb71-27" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.nemo_gym.NemoGymPlugin</span></span>
<span id="cb71-28"><a href="#cb71-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-29"><a href="#cb71-29" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_enabled</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb71-30"><a href="#cb71-30" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_auto_start</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
<span id="cb71-31"><a href="#cb71-31" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_head_port</span><span class="kw">:</span><span class="at"> </span><span class="dv">11000</span></span>
<span id="cb71-32"><a href="#cb71-32" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_multi_turn</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb71-33"><a href="#cb71-33" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_verify_timeout</span><span class="kw">:</span><span class="at"> </span><span class="dv">120</span></span>
<span id="cb71-34"><a href="#cb71-34" aria-hidden="true" tabindex="-1"></a><span class="fu">nemo_gym_datasets</span><span class="kw">:</span></span>
<span id="cb71-35"><a href="#cb71-35" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> resources_servers/example_single_tool_call/data/weather_tool_calling.jsonl</span></span>
<span id="cb71-36"><a href="#cb71-36" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">server_name</span><span class="kw">:</span><span class="at"> example_single_tool_call</span></span>
<span id="cb71-37"><a href="#cb71-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-38"><a href="#cb71-38" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb71-39"><a href="#cb71-39" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/Gym/resources_servers/example_single_tool_call/data/weather_tool_calling.jsonl</span></span>
<span id="cb71-40"><a href="#cb71-40" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chat_template</span></span>
<span id="cb71-41"><a href="#cb71-41" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">field_messages</span><span class="kw">:</span><span class="at"> responses_create_params.input</span></span>
<span id="cb71-42"><a href="#cb71-42" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_content</span><span class="kw">:</span><span class="at"> content</span></span>
<span id="cb71-43"><a href="#cb71-43" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">message_field_role</span><span class="kw">:</span><span class="at"> role</span></span>
<span id="cb71-44"><a href="#cb71-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-45"><a href="#cb71-45" aria-hidden="true" tabindex="-1"></a><span class="fu">vllm</span><span class="kw">:</span></span>
<span id="cb71-46"><a href="#cb71-46" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">gpu_memory_utilization</span><span class="kw">:</span><span class="at"> </span><span class="fl">0.85</span></span>
<span id="cb71-47"><a href="#cb71-47" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">max_model_len</span><span class="kw">:</span><span class="at"> </span><span class="dv">2048</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Multi-turn requires three services running:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb72"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb72-1"><a href="#cb72-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 1: vLLM with LoRA + tool calling</span></span>
<span id="cb72-2"><a href="#cb72-2" aria-hidden="true" tabindex="-1"></a><span class="va">VLLM_ALLOW_RUNTIME_LORA_UPDATING</span><span class="op">=</span>1 <span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>0 <span class="dt">\</span></span>
<span id="cb72-3"><a href="#cb72-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">python</span> <span class="at">-m</span> vllm.entrypoints.openai.api_server <span class="dt">\</span></span>
<span id="cb72-4"><a href="#cb72-4" aria-hidden="true" tabindex="-1"></a> <span class="at">--model</span> Qwen/Qwen3-0.6B <span class="at">--max-model-len</span> 2048 <span class="dt">\</span></span>
<span id="cb72-5"><a href="#cb72-5" aria-hidden="true" tabindex="-1"></a> <span class="at">--gpu-memory-utilization</span> 0.85 <span class="dt">\</span></span>
<span id="cb72-6"><a href="#cb72-6" aria-hidden="true" tabindex="-1"></a> <span class="at">--enable-lora</span> <span class="at">--max-lora-rank</span> 64 <span class="dt">\</span></span>
<span id="cb72-7"><a href="#cb72-7" aria-hidden="true" tabindex="-1"></a> <span class="at">--enable-auto-tool-choice</span> <span class="at">--tool-call-parser</span> hermes</span>
<span id="cb72-8"><a href="#cb72-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb72-9"><a href="#cb72-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 2: NeMo Gym servers (resource + model proxy + agent)</span></span>
<span id="cb72-10"><a href="#cb72-10" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym <span class="kw">&amp;&amp;</span> <span class="ex">.venv/bin/ng_run</span> <span class="dt">\</span></span>
<span id="cb72-11"><a href="#cb72-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"+config_paths=[configs/axolotl_tool_calling.yaml]"</span> <span class="dt">\</span></span>
<span id="cb72-12"><a href="#cb72-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"+skip_venv_if_present=true"</span></span>
<span id="cb72-13"><a href="#cb72-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb72-14"><a href="#cb72-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Terminal 3: Training</span></span>
<span id="cb72-15"><a href="#cb72-15" aria-hidden="true" tabindex="-1"></a><span class="va">CUDA_VISIBLE_DEVICES</span><span class="op">=</span>1 <span class="ex">axolotl</span> train config.yaml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Important
</div>
</div>
<div class="callout-body-container callout-body">
<p>Multi-turn requires a NeMo Gym agent config YAML that defines three components: a resource server (tools + <code>/verify</code>), a model server proxy (forwards to your vLLM), and an agent server (orchestrates <code>/run</code>). See the <a href="https://github.com/NVIDIA-NeMo/Gym">NeMo Gym README</a> for agent config format.</p>
</div>
</div>
</section>
<section id="nemo-gym-prerequisites" class="level4">
<h4 class="anchored" data-anchor-id="nemo-gym-prerequisites">NeMo Gym Prerequisites</h4>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb73"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb73-1"><a href="#cb73-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Clone and set up NeMo Gym</span></span>
<span id="cb73-2"><a href="#cb73-2" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://github.com/NVIDIA-NeMo/Gym.git ~/Gym</span>
<span id="cb73-3"><a href="#cb73-3" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ~/Gym</span>
<span id="cb73-4"><a href="#cb73-4" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> venv <span class="at">--python</span> 3.12 <span class="kw">&amp;&amp;</span> <span class="bu">source</span> .venv/bin/activate <span class="kw">&amp;&amp;</span> <span class="ex">uv</span> sync</span>
<span id="cb73-5"><a href="#cb73-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb73-6"><a href="#cb73-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Fix pycosat build (GCC 13+)</span></span>
<span id="cb73-7"><a href="#cb73-7" aria-hidden="true" tabindex="-1"></a><span class="va">CFLAGS</span><span class="op">=</span><span class="st">""</span> <span class="ex">uv</span> pip install pycosat <span class="at">--python</span> .venv/bin/python <span class="at">--no-build-isolation</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
</section>
<section id="nemo-gym-configuration-reference" class="level4">
<h4 class="anchored" data-anchor-id="nemo-gym-configuration-reference">NeMo Gym Configuration Reference</h4>
<table class="caption-top table">
<colgroup>
<col style="width: 28%">
<col style="width: 15%">
<col style="width: 23%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Parameter</th>
<th>Type</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>nemo_gym_enabled</code></td>
<td>bool</td>
<td></td>
<td>Enable the NeMo Gym integration</td>
</tr>
<tr class="even">
<td><code>nemo_gym_dir</code></td>
<td>str</td>
<td><code>~/Gym</code></td>
<td>Path to NeMo Gym repo</td>
</tr>
<tr class="odd">
<td><code>nemo_gym_auto_start</code></td>
<td>bool</td>
<td><code>true</code></td>
<td>Auto-start resource servers</td>
</tr>
<tr class="even">
<td><code>nemo_gym_head_port</code></td>
<td>int</td>
<td><code>11000</code></td>
<td>Head server port</td>
</tr>
<tr class="odd">
<td><code>nemo_gym_multi_turn</code></td>
<td>bool</td>
<td><code>false</code></td>
<td>Enable multi-turn via agent <code>/run</code></td>
</tr>
<tr class="even">
<td><code>nemo_gym_verify_timeout</code></td>
<td>int</td>
<td><code>30</code></td>
<td>Per-request timeout (seconds)</td>
</tr>
<tr class="odd">
<td><code>nemo_gym_datasets</code></td>
<td>list</td>
<td>required</td>
<td>Dataset configs with <code>path</code> and <code>server_name</code></td>
</tr>
</tbody>
</table>
</section>
<section id="reward-functions-2" class="level4">
<h4 class="anchored" data-anchor-id="reward-functions-2">Reward Functions</h4>
<table class="caption-top table">
<colgroup>
<col style="width: 34%">
<col style="width: 20%">
<col style="width: 44%">
</colgroup>
<thead>
<tr class="header">
<th>Function</th>
<th>Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify</code></td>
<td>Single-turn</td>
<td>Calls <code>/verify</code>, returns binary reward</td>
</tr>
<tr class="even">
<td><code>axolotl.integrations.nemo_gym.rewards.reward_env</code></td>
<td>Multi-turn</td>
<td>Passthrough reward from agent <code>/run</code></td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="using-local-dataset-files" class="level3">
<h3 class="anchored" data-anchor-id="using-local-dataset-files">Using local dataset files</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb65"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb65-1"><a href="#cb65-1" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb65-2"><a href="#cb65-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">ds_type</span><span class="kw">:</span><span class="at"> json</span></span>
<span id="cb65-3"><a href="#cb65-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">data_files</span><span class="kw">:</span></span>
<span id="cb65-4"><a href="#cb65-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> orca_rlhf.jsonl</span></span>
<span id="cb65-5"><a href="#cb65-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train</span></span>
<span id="cb65-6"><a href="#cb65-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chatml.intel</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb74"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb74-1"><a href="#cb74-1" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
<span id="cb74-2"><a href="#cb74-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">ds_type</span><span class="kw">:</span><span class="at"> json</span></span>
<span id="cb74-3"><a href="#cb74-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">data_files</span><span class="kw">:</span></span>
<span id="cb74-4"><a href="#cb74-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> orca_rlhf.jsonl</span></span>
<span id="cb74-5"><a href="#cb74-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">split</span><span class="kw">:</span><span class="at"> train</span></span>
<span id="cb74-6"><a href="#cb74-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">type</span><span class="kw">:</span><span class="at"> chatml.intel</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
</section>
<section id="trl-auto-unwrapping-for-peft" class="level3">
<h3 class="anchored" data-anchor-id="trl-auto-unwrapping-for-peft">TRL auto-unwrapping for PEFT</h3>
<p>TRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb66"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb66-1"><a href="#cb66-1" aria-hidden="true" tabindex="-1"></a><span class="co"># load ref model when adapter training.</span></span>
<span id="cb66-2"><a href="#cb66-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rl_adapter_ref_model</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb75"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb75-1"><a href="#cb75-1" aria-hidden="true" tabindex="-1"></a><span class="co"># load ref model when adapter training.</span></span>
<span id="cb75-2"><a href="#cb75-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rl_adapter_ref_model</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
</section>

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff