Nemo gym integration (#3516) [skip ci]
* nemo gym integration with grpo wip * mostly working * cleanup * simplify * update docs * nemo gym support wip * cleanup * chore: lint * address PR review and add more tests * chore: lint * post merge lora fixes for CI (#3536) [skip ci] * post merge lora fixes for CI * handle lora kernel auto-enable for moe without grouped_mm * prefer not to import torch in schema validation * address pr comments, add timeout, add tests * roundup_power2_divisions not needed with newer pytorch versions (#3540) * roundup_power2_divisions not needed with newer pytorch versions * remove typo * update qwen3.5 moe 35b-a3b yaml for 5090 * more bug fixes * fix tests to match updated trainer * don't use fa2 for hooks test * reset plugins on the instance * retry download * fix references to renamed axolotl_cfg property on trainer * Fix ref to trainer cfg * fix: robust handling of race condition on patching check (#3543) [skip ci] * EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci] * EBFT wip * fixes * more fixeS * add missing strided module * ebft fixes for multi-turn * make ebft work with async * add example for ebft w qwen3.5 * fix for split thinking and update yaml for lora over linear attention only * enforce_eager for vllm arg in schema * fix sync weights * fix multi-gpu * handle updated sig for mm * ddp fixes * improve multi-gpu handling, don't calculate logits, adaptive completion length * chore: lint * chore: lint * support completion_mean * Address corereview feedback * clamp min IS ratio * Address PR code review * more fixes identified * address code review * Fix property from rebase conflict * fix for ebft sync and update docs * make trainer loss patch check a solo test --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -47,14 +47,11 @@ For **unstructured text** without prompt/completion splits (e.g., raw code, pros
|
||||
### Structured Mode (QA data + vLLM)
|
||||
|
||||
```bash
|
||||
# 1. Start vLLM server
|
||||
python -m trl.scripts.vllm_serve \
|
||||
--model meta-llama/Llama-3.2-1B \
|
||||
--host 0.0.0.0 --port 8000 \
|
||||
--gpu-memory-utilization 0.3
|
||||
# 1. Start vLLM server (LoRA serve module auto-selected when vllm_lora_sync: true)
|
||||
CUDA_VISIBLE_DEVICES=0 axolotl vllm-serve examples/ebft/qwen3-4b-ebft-structured-async.yaml
|
||||
|
||||
# 2. Train
|
||||
axolotl train examples/ebft/llama-1b-ebft-opencode.yaml
|
||||
# 2. Train on a separate GPU
|
||||
CUDA_VISIBLE_DEVICES=1 axolotl train examples/ebft/qwen3-4b-ebft-structured-async.yaml
|
||||
```
|
||||
|
||||
### Strided Mode (unstructured text)
|
||||
|
||||
Reference in New Issue
Block a user