Deterministic phone confirmation safety net + docs

EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).

CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-27 15:52:22 +00:00
parent 1e0472e864
commit d7bfe2dbe8
2 changed files with 79 additions and 24 deletions

View File

@@ -40,6 +40,14 @@ Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finis
clipped, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up`
drops the carrier leg. Verified working in the Phase 1 gate (4/4 clean hang-ups).
It also **deterministically guarantees the callback number is confirmed** on booking calls:
the 8B reads the number back only ~half the time, so if a closing is reached on a booking
call (booking keyword seen) without the agent having spoken the number (`phone_marker` not
seen in its replies), the hang-up is suppressed and a scripted confirmation line
(`phone_confirm_line`, the caller-ID spelled out) is injected as a `TTSSpeakFrame` first.
The agent's own readback satisfies the gate, so there's no double-ask in the common case;
info-only calls (no booking keyword) are never asked for a number.
**Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`.
`PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly.
No custom audio module needed.
@@ -250,18 +258,25 @@ time, leading the call rather than waiting on the caller. Fixed order:
2. **Location** — ask city/area, confirm the matching office (don't offer others — see office rule).
3. **Caller info** — full name (ask last name if only a first is given), then **address the caller
by name** from there on; insurance (log only); preferred day/time in their words.
4. **Verify phone** — near the end, read the caller-ID back digit-by-digit and ask if it's best;
if not, use the number they give. Never raised earlier in the call.
5. **Wrap up** — recap the booking by name, then ask **"Is there anything else I can help you
with?"**
4. **Verify phone** — near the end, state the caller-ID back in one line ("I have your number
as <number> — is that the best number?"), no asking permission first; if not, use the number
they give. Never raised earlier. **Backed by a deterministic safety net** — if the agent
skips it, `EndCallProcessor` injects the confirmation before hang-up (see "already solved").
5. **Wrap up** — recap the booking **as a REQUEST** by name ("I've noted your request to come
in…"), make clear staff will call to confirm, then ask **"Is there anything else I can help
you with?"**
**Never claims a booking:** AVA must never say an appointment is "booked / scheduled / set /
confirmed" — everything is a request staff confirm on callback. **Insurance:** never say "we
accept/take" a plan (or invent one) — just note what the caller said; staff verify.
**Closing is gated:** the word "Goodbye" ends the call (triggers `EndCallProcessor` → hang-up),
so it is never said in the same turn as confirming details and never before the anything-else
question — only after the caller says they need nothing more.
> Reliability: this is prompt-driven on the local 8B, so order is followed well but not
> perfectly — the phone-readback step in particular varies (sometimes reads back, sometimes
> asks for the number), and it can re-ask a last name. Same model ceiling noted elsewhere.
> Reliability: the script is prompt-driven on the local 8B (order followed well, not perfectly;
> it can re-ask a last name). The phone-confirmation step is the exception — it's now
> **guaranteed** by the deterministic `EndCallProcessor` safety net.
## Call Data Capture
@@ -277,7 +292,7 @@ Replies are kept to one short sentence.
| Phone | Confirmed **near the end** (not led with); reads back the caller-ID — injected pre-spelled so it's said digit-by-digit — and if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) |
| Office / city | Asks city/area; when the caller names a place that matches an office, **confirms that office and moves on** — never offers/compares other offices or asks them to choose; names the nearest only if nothing matches | folded into `reason` prefix |
| Reason | Captured from the conversation | `reason` |
| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks them to repeat if unclear), never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") |
| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks to repeat if unclear), never says "we accept/take" a plan, never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") |
| Preferred day & time | **Capture & defer** — taken in the caller's own words; AVA does not compute or correct the date | `preferred_time` + best-effort resolved `YYYY-MM-DD` |
### Dates — capture & defer (do NOT compute in-call)

72
bot.py
View File

@@ -203,20 +203,35 @@ def _build_tools() -> ToolsSchema:
class EndCallProcessor(FrameProcessor):
"""Lets the agent hang up. MUST sit between the LLM and the TTS: there it sees her reply
text (LLMTextFrame, flowing downstream) AND the upstream copy of BotStoppedSpeakingFrame
the output transport emits. It accumulates each reply; if the finished reply contains a
closing ('goodbye'/'adiós'), it waits until she's done speaking, pauses HANGUP_DELAY_SECS
so the caller isn't clipped, then pushes EndTaskFrame upstream — the task ends and
TwilioFrameSerializer (auto_hang_up) drops the call."""
"""Lets the agent hang up AND guarantees the callback number is confirmed once.
Sits between the LLM and the TTS: it sees reply text (LLMTextFrame, downstream) and the
upstream BotStoppedSpeakingFrame. On a closing ('goodbye'/'adiós') it waits for TTS to
finish, pauses HANGUP_DELAY_SECS so the caller isn't clipped, then pushes EndTaskFrame
(TwilioFrameSerializer auto_hang_up drops the call).
Deterministic phone confirmation: the prompt asks the agent to read the callback number
back, but the 8B skips it ~half the time. So if a closing is reached and the agent never
spoke the number this call (`phone_marker` not seen in its replies), we suppress the
hang-up and inject a scripted confirmation turn first — guaranteeing it happens exactly
once (the agent's own readback satisfies the gate, so no double-ask in the common case)."""
_CLOSINGS = ("goodbye", "good-bye", "good bye", "adiós", "adios", "hasta luego")
# Only force phone confirmation when a booking was actually underway (not info-only calls).
_BOOKING_KWS = ("appointment", "schedule", "book", "insurance", "what day", "what time",
"come in", "preferred")
def __init__(self):
def __init__(self, phone_confirm_line: str | None = None, phone_marker: str | None = None):
super().__init__()
self._buf = ""
self._should_end = False
self._end_task = None
self._phone_confirm_line = phone_confirm_line
self._phone_marker = (phone_marker or "").lower()
# Nothing to confirm (no caller ID) → treat as already handled.
self._phone_confirmed = not phone_confirm_line
self._assistant_seen = ""
self._pending_phone_inject = False
@classmethod
def _is_closing(cls, text: str) -> bool:
@@ -235,17 +250,31 @@ class EndCallProcessor(FrameProcessor):
await super().process_frame(frame, direction)
if isinstance(frame, LLMTextFrame):
self._buf += frame.text
self._assistant_seen += frame.text.lower()
if self._phone_marker and self._phone_marker in self._assistant_seen:
self._phone_confirmed = True # the agent read the number back itself
elif isinstance(frame, LLMFullResponseEndFrame):
if self._is_closing(self._buf):
self._should_end = True
logger.info(f"{AGENT_NAME} signalled closing -- will hang up "
f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking")
booking = any(k in self._assistant_seen for k in self._BOOKING_KWS)
if self._phone_confirmed or not booking:
self._should_end = True
logger.info(f"{AGENT_NAME} signalled closing -- will hang up "
f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking")
else:
# Booking call closing without the number confirmed — do it deterministically.
self._pending_phone_inject = True
logger.info(f"{AGENT_NAME} reached closing w/o phone confirmation -- injecting it")
self._buf = ""
elif isinstance(frame, BotStoppedSpeakingFrame) and self._should_end:
self._should_end = False
# Schedule the teardown so we don't block the pipeline during the grace pause.
if self._end_task is None:
self._end_task = asyncio.create_task(self._hang_up_after_delay())
elif isinstance(frame, BotStoppedSpeakingFrame):
if self._pending_phone_inject:
self._pending_phone_inject = False
self._phone_confirmed = True
await self.push_frame(TTSSpeakFrame(self._phone_confirm_line), FrameDirection.DOWNSTREAM)
elif self._should_end:
self._should_end = False
# Schedule the teardown so we don't block the pipeline during the grace pause.
if self._end_task is None:
self._end_task = asyncio.create_task(self._hang_up_after_delay())
await self.push_frame(frame, direction)
@@ -455,7 +484,18 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru
context_kwargs["tools"] = _build_tools()
context = LLMContext(**context_kwargs)
agg = LLMContextAggregatorPair(context)
endcall = EndCallProcessor()
# Deterministic phone-confirmation safety net: if the agent reaches a closing without
# having read the caller-ID back, EndCallProcessor speaks this scripted line first.
if caller_number:
_spoken = _spoken_phone(caller_number)
phone_confirm_line = (
f"Before you go, let me make sure I have the best number to reach you: "
f"{_spoken}. Is that correct?"
)
phone_marker = _spoken.split(",")[0].strip() # e.g. "nine seven three"
else:
phone_confirm_line = phone_marker = None
endcall = EndCallProcessor(phone_confirm_line=phone_confirm_line, phone_marker=phone_marker)
pipeline = Pipeline(
[