Fix echo-induced silence with a half-duplex audio gate

A caller's reply was generated but never heard: 0.65s after the agent started speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an interruption that cancelled the agent's audio -> ~24s of silence until the caller spoke again. Cause: the agent's own TTS echoes back the phone line and the always-on VAD interruption treats it as a barge-in. (PipelineParams has no allow_interruptions in this pipecat build — it was a silent no-op.) Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks (+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in. Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it. Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 16:44:00 +00:00
parent ceea3d151c
commit 32a3bb7136
3 changed files with 50 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -62,6 +62,15 @@ VAD side.
 **`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from
 transport stall. Keep it.

+**`HalfDuplexGate` in `bot.py`** — fixes echo-induced mid-call silence. In this pipecat build
+interruptions are VAD-driven and always on (`PipelineParams.allow_interruptions` does NOT exist
+— it's silently ignored). On a phone line the agent's own TTS echoes back, the VAD reads it as
+the caller speaking (it produces NO transcript), and the broadcast interruption cancels the
+agent mid-reply → the caller hears silence. This gate sits BEFORE the VAD and withholds inbound
+audio while the bot is speaking (+`ECHO_TAIL_SECS`, default 0.5s) so echo never reaches the VAD.
+Trade-off: half-duplex — the caller can't barge in mid-utterance (fine for short replies).
+`HALF_DUPLEX=false` restores barge-in. Keep it on for telephony.
+
 **Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends.
 Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model
 output, falls back to JSONL if Odoo is unreachable. Keep it.