axolotl

Files

Wing Lian 69f165b39b probe vLLM weight-sync routes and select transport per server

The plugin used to unconditionally monkey-patch
VLLMClient.init_communicator to a no-op AND silently no-op
sync_weights when vllm_lora_sync was off. Combined, this turned the
trainer into a functional no-op whenever (a) the user ran NeMo Gym
+ LoRA without remembering to set vllm_lora_sync=true or (b) the
user ran NeMo Gym + full fine-tune (which had no working sync path
under the old code).

Replace both patches with:

1. A probe of the configured vLLM server's /openapi.json at
   pre_model_load. Three transports are recognized:
     - NCCL (/init_communicator/ + /update_named_param/) — TRL serve
       and axolotl vllm-serve both expose this
     - LoRA filesystem (/v1/load_lora_adapter or /set_lora_adapter/)
     - HTTP base64 full-weight (/http_update_weights/) — axolotl
       vllm-serve only

2. A pure-logic ``select_weight_sync_transport`` that picks the
   right one for (server caps × adapter type).

3. ``init_communicator`` is only patched out when the server has no
   NCCL routes; against TRL/axolotl serve modules it stays live so
   full-finetune NCCL sync works.

4. ``post_trainer_create`` uses the selection table to install LoRA
   filesystem sync OR leave the standard NCCL flow alone OR raise
   NotImplementedError (HTTP — pending) OR raise a precise diagnosis
   when no transport is viable. No more silent no-op trainers.

2026-04-15 13:27:30 +00:00

__init__.py

upgrade liger to 0.4.0 (#1973 )

2024-11-07 12:53:34 -05:00

test_diffusion_callback.py

roundup_power2_divisions not needed with newer pytorch versions (#3540 )

2026-03-24 15:40:05 -04:00

test_diffusion.py

roundup_power2_divisions not needed with newer pytorch versions (#3540 )

2026-03-24 15:40:05 -04:00

test_gemma4_moe.py

gemma4 support (#3574 )