Two bugs in ``AsyncGRPOTrainer._maybe_sync_vllm_weights`` plus a
companion bug in the sync-hook patch site that together neutralized
LoRA weight sync entirely whenever ``async_prefetch=False`` was
combined with NeMo Gym's data-producer path:
1. ``_maybe_sync_vllm_weights`` had ``if not async_prefetch: return``
at the top. The original design assumed sync mode would fall back
to TRL's stock per-step ``sync_weights`` call inside
``_generate_single_turn`` — true for vanilla GRPO but FALSE in
NeMo Gym multi-turn, where ``NemoGymDataProducer`` calls the agent
server directly and ``_generate_single_turn`` is never invoked.
Result: no sync ever happened in NeMo Gym sync mode.
2. ``step % vllm_sync_interval`` would TypeError on the first call if
``vllm_sync_interval`` was unset (the default for any config that
doesn't explicitly set it).
3. The ``_generate_single_turn`` patch installed
``vllm_generation.sync_weights = lambda: None`` unconditionally
for vllm_lora_sync runs. That's correct in async-prefetch mode
(BG thread can't safely sync) but wrong in sync mode: TRL's
per-step auto-sync inside ``_generate_single_turn`` was the
fallback that the early return in (1) was assuming, and the
no-op patch was killing it.
Fix:
- Drop the ``not async_prefetch`` early return; ``_maybe_sync_vllm_weights``
is now the canonical sync trigger and runs in both modes from
``_prepare_inputs_with_data_producer`` / ``_prepare_inputs_legacy_async``.
- Default ``vllm_sync_interval`` to 1 when unset.
- In the ``_generate_single_turn`` patch, route sync_weights to
``_sync_lora_adapter`` in sync mode (and keep the lambda no-op
in async mode for the BG-thread safety reason).