Fix dead-air: stop VAD interruption broadcasts under half-duplex

Live call diagnosis (recording + log): replies were generated in <1s but a
false VAD trigger (background noise, no transcript) fired 0.7s later, and the
aggregator's broadcast_interruption silently discarded the queued TTS audio.
Caller heard 20-35s of silence, said "Hello?", repeated themselves. The
HalfDuplexGate only closes while the bot is audibly speaking, so the window
between generation start and first wire audio was unprotected. SilenceWatchdog
never fired because the cancelled reply never emitted BotStoppedSpeaking.

With HALF_DUPLEX on, build the user aggregator with enable_interruptions=False
on both turn-start strategies: strict turn-taking, nothing is ever cancelled.
UserStartedSpeakingFrame still flows, so watchdog resets keep working.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-07-04 03:07:24 +00:00
parent 78ff978416
commit 3ed63d8ea9
2 changed files with 43 additions and 2 deletions

View File

@@ -83,6 +83,19 @@ audio while the bot is speaking (+`ECHO_TAIL_SECS`, default 0.5s) so echo never
Trade-off: half-duplex — the caller can't barge in mid-utterance (fine for short replies). Trade-off: half-duplex — the caller can't barge in mid-utterance (fine for short replies).
`HALF_DUPLEX=false` restores barge-in. Keep it on for telephony. `HALF_DUPLEX=false` restores barge-in. Keep it on for telephony.
**Interruption broadcasts OFF under half-duplex (2026-07-04).** The gate left one window
open: between "LLM starts generating" and "first audio on the wire" the bot isn't speaking
yet, so caller-side audio still reaches the VAD — and a false VAD blip (breath/background
noise, no transcript ever produced) made the user aggregator `broadcast_interruption`,
silently discarding the queued reply. Live call showed 2035s of dead air, the caller saying
"Hello?" and repeating themselves; SilenceWatchdog never fired because the cancelled reply
never produced a `BotStoppedSpeakingFrame` to arm it. Fix: when `HALF_DUPLEX` is on, the
aggregator is built with `VADUserTurnStartStrategy(enable_interruptions=False)` +
`TranscriptionUserTurnStartStrategy(enable_interruptions=False)` — strict turn-taking, no
interruption broadcasts at all (there's nothing legitimate for them to do in a no-barge-in
bot). `UserStartedSpeakingFrame` is still emitted, so the watchdog reset keeps working. If
the caller talks over generation, both replies play in order instead of one being dropped.
**`CallStateGroomer` (`callstate.py`) — deterministic slot memory (2026-07-03).** Fixes the **`CallStateGroomer` (`callstate.py`) — deterministic slot memory (2026-07-03).** Fixes the
8B re-asking for things the caller already gave (name, reason, phone — seen repeatedly in the 8B re-asking for things the caller already gave (name, reason, phone — seen repeatedly in the
historical call logs: "Didn't you say you had my phone number?", "I already gave you my full historical call logs: "Didn't you say you had my phone number?", "I already gave you my full

32
bot.py
View File

@@ -39,7 +39,15 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.turns.user_start import (
TranscriptionUserTurnStartStrategy,
VADUserTurnStartStrategy,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.processors.audio.vad_processor import VADProcessor from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
@@ -675,7 +683,27 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru
if ENABLE_TOOLS: if ENABLE_TOOLS:
context_kwargs["tools"] = _build_tools() context_kwargs["tools"] = _build_tools()
context = LLMContext(**context_kwargs) context = LLMContext(**context_kwargs)
agg = LLMContextAggregatorPair(context) # STRICT TURN-TAKING — no interruption broadcasts (live-call diagnosis 2026-07-04):
# interruptions are VAD-driven and fire on ANY turn start. HalfDuplexGate already blocks
# barge-in while the bot SPEAKS, but between "LLM starts generating" and "first audio on
# the wire" the gate is open — a false VAD blip (breath/background noise, no transcript) in that
# window broadcast an interruption that silently discarded the queued reply: caller heard
# 20-35s of dead air and said "Hello?". With HALF_DUPLEX there is nothing legitimate for
# an interruption to do, so don't broadcast them at all. UserStartedSpeakingFrame is still
# emitted (SilenceWatchdog reset keeps working); if the caller talks over generation, both
# replies simply play in order instead of one being thrown away.
if HALF_DUPLEX:
user_params = LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[
VADUserTurnStartStrategy(enable_interruptions=False),
TranscriptionUserTurnStartStrategy(enable_interruptions=False),
],
),
)
agg = LLMContextAggregatorPair(context, user_params=user_params)
else:
agg = LLMContextAggregatorPair(context)
# Deterministic slot memory: merges fragmented user turns + injects the live # Deterministic slot memory: merges fragmented user turns + injects the live
# collected/needed checklist into the system message before each generation. # collected/needed checklist into the system message before each generation.
groomer = CallStateGroomer( groomer = CallStateGroomer(