Live call diagnosis (recording + log): replies were generated in <1s but a false VAD trigger (background noise, no transcript) fired 0.7s later, and the aggregator's broadcast_interruption silently discarded the queued TTS audio. Caller heard 20-35s of silence, said "Hello?", repeated themselves. The HalfDuplexGate only closes while the bot is audibly speaking, so the window between generation start and first wire audio was unprotected. SilenceWatchdog never fired because the cancelled reply never emitted BotStoppedSpeaking. With HALF_DUPLEX on, build the user aggregator with enable_interruptions=False on both turn-start strategies: strict turn-taking, nothing is ever cancelled. UserStartedSpeakingFrame still flows, so watchdog resets keep working. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
43 KiB
43 KiB