Nemo gym integration (#3516) [skip ci]

* nemo gym integration with grpo wip * mostly working * cleanup * simplify * update docs * nemo gym support wip * cleanup * chore: lint * address PR review and add more tests * chore: lint * post merge lora fixes for CI (#3536) [skip ci] * post merge lora fixes for CI * handle lora kernel auto-enable for moe without grouped_mm * prefer not to import torch in schema validation * address pr comments, add timeout, add tests * roundup_power2_divisions not needed with newer pytorch versions (#3540) * roundup_power2_divisions not needed with newer pytorch versions * remove typo * update qwen3.5 moe 35b-a3b yaml for 5090 * more bug fixes * fix tests to match updated trainer * don't use fa2 for hooks test * reset plugins on the instance * retry download * fix references to renamed axolotl_cfg property on trainer * Fix ref to trainer cfg * fix: robust handling of race condition on patching check (#3543) [skip ci] * EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci] * EBFT wip * fixes * more fixeS * add missing strided module * ebft fixes for multi-turn * make ebft work with async * add example for ebft w qwen3.5 * fix for split thinking and update yaml for lora over linear attention only * enforce_eager for vllm arg in schema * fix sync weights * fix multi-gpu * handle updated sig for mm * ddp fixes * improve multi-gpu handling, don't calculate logits, adaptive completion length * chore: lint * chore: lint * support completion_mean * Address corereview feedback * clamp min IS ratio * Address PR code review * more fixes identified * address code review * Fix property from rebase conflict * fix for ebft sync and update docs * make trainer loss patch check a solo test --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 07:38:06 -04:00
parent 2fb72798e0
commit c2bd75aff6
20 changed files with 3592 additions and 19 deletions
--- a/examples/ebft/README.md
+++ b/examples/ebft/README.md
@@ -47,14 +47,11 @@ For **unstructured text** without prompt/completion splits (e.g., raw code, pros
 ### Structured Mode (QA data + vLLM)

 ```bash
-# 1. Start vLLM server
-python -m trl.scripts.vllm_serve \
-    --model meta-llama/Llama-3.2-1B \
-    --host 0.0.0.0 --port 8000 \
-    --gpu-memory-utilization 0.3
+# 1. Start vLLM server (LoRA serve module auto-selected when vllm_lora_sync: true)
+CUDA_VISIBLE_DEVICES=0 axolotl vllm-serve examples/ebft/qwen3-4b-ebft-structured-async.yaml

-# 2. Train
-axolotl train examples/ebft/llama-1b-ebft-opencode.yaml
+# 2. Train on a separate GPU
+CUDA_VISIBLE_DEVICES=1 axolotl train examples/ebft/qwen3-4b-ebft-structured-async.yaml
 ```

 ### Strided Mode (unstructured text)