Files
axolotl/tests/core
Wing Lian 7617b951a8 make _maybe_sync_vllm_weights actually fire in sync mode
Two bugs in ``AsyncGRPOTrainer._maybe_sync_vllm_weights`` plus a
companion bug in the sync-hook patch site that together neutralized
LoRA weight sync entirely whenever ``async_prefetch=False`` was
combined with NeMo Gym's data-producer path:

1. ``_maybe_sync_vllm_weights`` had ``if not async_prefetch: return``
   at the top. The original design assumed sync mode would fall back
   to TRL's stock per-step ``sync_weights`` call inside
   ``_generate_single_turn`` — true for vanilla GRPO but FALSE in
   NeMo Gym multi-turn, where ``NemoGymDataProducer`` calls the agent
   server directly and ``_generate_single_turn`` is never invoked.
   Result: no sync ever happened in NeMo Gym sync mode.

2. ``step % vllm_sync_interval`` would TypeError on the first call if
   ``vllm_sync_interval`` was unset (the default for any config that
   doesn't explicitly set it).

3. The ``_generate_single_turn`` patch installed
   ``vllm_generation.sync_weights = lambda: None`` unconditionally
   for vllm_lora_sync runs. That's correct in async-prefetch mode
   (BG thread can't safely sync) but wrong in sync mode: TRL's
   per-step auto-sync inside ``_generate_single_turn`` was the
   fallback that the early return in (1) was assuming, and the
   no-op patch was killing it.

Fix:
  - Drop the ``not async_prefetch`` early return; ``_maybe_sync_vllm_weights``
    is now the canonical sync trigger and runs in both modes from
    ``_prepare_inputs_with_data_producer`` / ``_prepare_inputs_legacy_async``.
  - Default ``vllm_sync_interval`` to 1 when unset.
  - In the ``_generate_single_turn`` patch, route sync_weights to
    ``_sync_lora_adapter`` in sync mode (and keep the lambda no-op
    in async mode for the BG-thread safety reason).
2026-04-15 13:27:30 +00:00
..