Phone confirmation: state number, invite correction only (no "yes")

Call-recording analysis proved the repetitive post-phone silence is NOT volume:
the caller's reply was a full-energy sound right after the (long ~13s) phone
question, but VAD never registered it (no "user started speaking"), so the call
waited until the caller repeated it. Depending on catching a "yes" after a long
utterance is fragile (echo/gate timing).

Fix: stop requiring a "yes". AVA now states the number and invites a correction
only ("...; if that's not the best number, just let me know.") and flows on —
the caller only speaks to correct it. Updated the prompt step, the caller-ID
injection, and the deterministic EndCallProcessor line. Verified 4/4.

Docs: phone step, recording + watchdog entries, recording's post-gate limitation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-27 17:57:54 +00:00
parent 80824a7ab0
commit 1cfdf562e2
2 changed files with 27 additions and 14 deletions

View File

@@ -64,6 +64,16 @@ echo false-triggers. Addresses the repeat-yourself / missed-short-answer problem
**`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from
transport stall. Keep it.
**Call recording (`AudioBufferProcessor`)** — every call is saved to `recordings/<ts>_<callSID>.wav`
as **stereo** (caller = left, agent = right) for review/debugging. It sits at the end of the
pipeline, so the caller channel is what the system *received* (post-`HalfDuplexGate`) — it does
NOT capture caller audio that arrived while the agent was speaking (gated). `RECORD_CALLS=false`
to disable. `recordings/` is gitignored.
**`SilenceWatchdog` in `bot.py`** — if the caller goes silent after the agent finishes, it
re-prompts ("are you still there?") after `SILENCE_REPROMPT_SECS` (7s), and after `MAX_REPROMPTS`
closes gracefully. Backstop against dead air; `silence_secs` must stay > `HANGUP_DELAY_SECS`.
**`HalfDuplexGate` in `bot.py`** — fixes echo-induced mid-call silence. In this pipecat build
interruptions are VAD-driven and always on (`PipelineParams.allow_interruptions` does NOT exist
— it's silently ignored). On a phone line the agent's own TTS echoes back, the VAD reads it as
@@ -269,10 +279,12 @@ time, leading the call rather than waiting on the caller. Fixed order:
2. **Location** — ask city/area, confirm the matching office (don't offer others — see office rule).
3. **Caller info** — full name (ask last name if only a first is given), then **address the caller
by name** from there on; insurance (log only); preferred day/time in their words.
4. **Verify phone** — near the end, state the caller-ID back in one line ("I have your number
as <number> is that the best number?"), no asking permission first; if not, use the number
they give. Never raised earlier. **Backed by a deterministic safety net** — if the agent
skips it, `EndCallProcessor` injects the confirmation before hang-up (see "already solved").
4. **Confirm phone (no "yes" needed)** — near the end, STATE the caller-ID back and invite a
correction *only* ("I have your number as <number>; if that's not the best number, just let me
know."), then flow on. **No yes/no question, no waiting** — depending on catching a "yes" right
after a long utterance kept failing (echo/gate timing; verified via call recording — the caller's
reply was received but VAD never registered it). Caller speaks only to correct it. Still backed
by the deterministic `EndCallProcessor` safety net (also a "let me know if wrong" statement).
5. **Wrap up** — recap the booking **as a REQUEST** by name ("I've noted your request to come
in…"), make clear staff will call to confirm, then ask **"Is there anything else I can help
you with?"**
@@ -305,7 +317,7 @@ Replies are kept to one short sentence.
| Field | In-call behavior | Logged as |
|-------|------------------|-----------|
| Full name | Asks for last name if only a first is given | `patient_name` / lead `contact_name` |
| Phone | Confirmed **near the end** (not led with); reads back the caller-ID injected pre-spelled so it's said digit-by-digit and if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) |
| Phone | Confirmed **near the end** (not led with); STATES the caller-ID back (injected pre-spelled, digit-by-digit) and invites a correction only — **no "yes" required**; uses a different number only if the caller gives one | `callback_number` (+ `phone_confirmed`) |
| Office / city | Asks city/area; when the caller names a place that matches an office, **confirms that office and moves on** — never offers/compares other offices or asks them to choose; names the nearest only if nothing matches | folded into `reason` prefix |
| Reason | Captured from the conversation | `reason` |
| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks to repeat if unclear), never says "we accept/take" a plan, never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") |