Fix echo-induced silence with a half-duplex audio gate

A caller's reply was generated but never heard: 0.65s after the agent started
speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an
interruption that cancelled the agent's audio -> ~24s of silence until the caller
spoke again. Cause: the agent's own TTS echoes back the phone line and the
always-on VAD interruption treats it as a barge-in. (PipelineParams has no
allow_interruptions in this pipecat build — it was a silent no-op.)

Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks
(+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in.
Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it.
Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-27 16:44:00 +00:00
parent ceea3d151c
commit 32a3bb7136
3 changed files with 50 additions and 0 deletions

View File

@@ -62,6 +62,15 @@ VAD side.
**`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from
transport stall. Keep it.
**`HalfDuplexGate` in `bot.py`** — fixes echo-induced mid-call silence. In this pipecat build
interruptions are VAD-driven and always on (`PipelineParams.allow_interruptions` does NOT exist
— it's silently ignored). On a phone line the agent's own TTS echoes back, the VAD reads it as
the caller speaking (it produces NO transcript), and the broadcast interruption cancels the
agent mid-reply → the caller hears silence. This gate sits BEFORE the VAD and withholds inbound
audio while the bot is speaking (+`ECHO_TAIL_SECS`, default 0.5s) so echo never reaches the VAD.
Trade-off: half-duplex — the caller can't barge in mid-utterance (fine for short replies).
`HALF_DUPLEX=false` restores barge-in. Keep it on for telephony.
**Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends.
Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model
output, falls back to JSONL if Odoo is unreachable. Keep it.