Deterministic phone confirmation safety net + docs

EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).

CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-27 15:52:22 +00:00
parent 1e0472e864
commit d7bfe2dbe8
2 changed files with 79 additions and 24 deletions

View File

@@ -40,6 +40,14 @@ Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finis
clipped, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up` clipped, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up`
drops the carrier leg. Verified working in the Phase 1 gate (4/4 clean hang-ups). drops the carrier leg. Verified working in the Phase 1 gate (4/4 clean hang-ups).
It also **deterministically guarantees the callback number is confirmed** on booking calls:
the 8B reads the number back only ~half the time, so if a closing is reached on a booking
call (booking keyword seen) without the agent having spoken the number (`phone_marker` not
seen in its replies), the hang-up is suppressed and a scripted confirmation line
(`phone_confirm_line`, the caller-ID spelled out) is injected as a `TTSSpeakFrame` first.
The agent's own readback satisfies the gate, so there's no double-ask in the common case;
info-only calls (no booking keyword) are never asked for a number.
**Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`. **Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`.
`PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly. `PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly.
No custom audio module needed. No custom audio module needed.
@@ -250,18 +258,25 @@ time, leading the call rather than waiting on the caller. Fixed order:
2. **Location** — ask city/area, confirm the matching office (don't offer others — see office rule). 2. **Location** — ask city/area, confirm the matching office (don't offer others — see office rule).
3. **Caller info** — full name (ask last name if only a first is given), then **address the caller 3. **Caller info** — full name (ask last name if only a first is given), then **address the caller
by name** from there on; insurance (log only); preferred day/time in their words. by name** from there on; insurance (log only); preferred day/time in their words.
4. **Verify phone** — near the end, read the caller-ID back digit-by-digit and ask if it's best; 4. **Verify phone** — near the end, state the caller-ID back in one line ("I have your number
if not, use the number they give. Never raised earlier in the call. as <number> — is that the best number?"), no asking permission first; if not, use the number
5. **Wrap up** — recap the booking by name, then ask **"Is there anything else I can help you they give. Never raised earlier. **Backed by a deterministic safety net** — if the agent
with?"** skips it, `EndCallProcessor` injects the confirmation before hang-up (see "already solved").
5. **Wrap up** — recap the booking **as a REQUEST** by name ("I've noted your request to come
in…"), make clear staff will call to confirm, then ask **"Is there anything else I can help
you with?"**
**Never claims a booking:** AVA must never say an appointment is "booked / scheduled / set /
confirmed" — everything is a request staff confirm on callback. **Insurance:** never say "we
accept/take" a plan (or invent one) — just note what the caller said; staff verify.
**Closing is gated:** the word "Goodbye" ends the call (triggers `EndCallProcessor` → hang-up), **Closing is gated:** the word "Goodbye" ends the call (triggers `EndCallProcessor` → hang-up),
so it is never said in the same turn as confirming details and never before the anything-else so it is never said in the same turn as confirming details and never before the anything-else
question — only after the caller says they need nothing more. question — only after the caller says they need nothing more.
> Reliability: this is prompt-driven on the local 8B, so order is followed well but not > Reliability: the script is prompt-driven on the local 8B (order followed well, not perfectly;
> perfectly — the phone-readback step in particular varies (sometimes reads back, sometimes > it can re-ask a last name). The phone-confirmation step is the exception — it's now
> asks for the number), and it can re-ask a last name. Same model ceiling noted elsewhere. > **guaranteed** by the deterministic `EndCallProcessor` safety net.
## Call Data Capture ## Call Data Capture
@@ -277,7 +292,7 @@ Replies are kept to one short sentence.
| Phone | Confirmed **near the end** (not led with); reads back the caller-ID — injected pre-spelled so it's said digit-by-digit — and if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) | | Phone | Confirmed **near the end** (not led with); reads back the caller-ID — injected pre-spelled so it's said digit-by-digit — and if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) |
| Office / city | Asks city/area; when the caller names a place that matches an office, **confirms that office and moves on** — never offers/compares other offices or asks them to choose; names the nearest only if nothing matches | folded into `reason` prefix | | Office / city | Asks city/area; when the caller names a place that matches an office, **confirms that office and moves on** — never offers/compares other offices or asks them to choose; names the nearest only if nothing matches | folded into `reason` prefix |
| Reason | Captured from the conversation | `reason` | | Reason | Captured from the conversation | `reason` |
| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks them to repeat if unclear), never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") | | Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks to repeat if unclear), never says "we accept/take" a plan, never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") |
| Preferred day & time | **Capture & defer** — taken in the caller's own words; AVA does not compute or correct the date | `preferred_time` + best-effort resolved `YYYY-MM-DD` | | Preferred day & time | **Capture & defer** — taken in the caller's own words; AVA does not compute or correct the date | `preferred_time` + best-effort resolved `YYYY-MM-DD` |
### Dates — capture & defer (do NOT compute in-call) ### Dates — capture & defer (do NOT compute in-call)

72
bot.py
View File

@@ -203,20 +203,35 @@ def _build_tools() -> ToolsSchema:
class EndCallProcessor(FrameProcessor): class EndCallProcessor(FrameProcessor):
"""Lets the agent hang up. MUST sit between the LLM and the TTS: there it sees her reply """Lets the agent hang up AND guarantees the callback number is confirmed once.
text (LLMTextFrame, flowing downstream) AND the upstream copy of BotStoppedSpeakingFrame
the output transport emits. It accumulates each reply; if the finished reply contains a Sits between the LLM and the TTS: it sees reply text (LLMTextFrame, downstream) and the
closing ('goodbye'/'adiós'), it waits until she's done speaking, pauses HANGUP_DELAY_SECS upstream BotStoppedSpeakingFrame. On a closing ('goodbye'/'adiós') it waits for TTS to
so the caller isn't clipped, then pushes EndTaskFrame upstream — the task ends and finish, pauses HANGUP_DELAY_SECS so the caller isn't clipped, then pushes EndTaskFrame
TwilioFrameSerializer (auto_hang_up) drops the call.""" (TwilioFrameSerializer auto_hang_up drops the call).
Deterministic phone confirmation: the prompt asks the agent to read the callback number
back, but the 8B skips it ~half the time. So if a closing is reached and the agent never
spoke the number this call (`phone_marker` not seen in its replies), we suppress the
hang-up and inject a scripted confirmation turn first — guaranteeing it happens exactly
once (the agent's own readback satisfies the gate, so no double-ask in the common case)."""
_CLOSINGS = ("goodbye", "good-bye", "good bye", "adiós", "adios", "hasta luego") _CLOSINGS = ("goodbye", "good-bye", "good bye", "adiós", "adios", "hasta luego")
# Only force phone confirmation when a booking was actually underway (not info-only calls).
_BOOKING_KWS = ("appointment", "schedule", "book", "insurance", "what day", "what time",
"come in", "preferred")
def __init__(self): def __init__(self, phone_confirm_line: str | None = None, phone_marker: str | None = None):
super().__init__() super().__init__()
self._buf = "" self._buf = ""
self._should_end = False self._should_end = False
self._end_task = None self._end_task = None
self._phone_confirm_line = phone_confirm_line
self._phone_marker = (phone_marker or "").lower()
# Nothing to confirm (no caller ID) → treat as already handled.
self._phone_confirmed = not phone_confirm_line
self._assistant_seen = ""
self._pending_phone_inject = False
@classmethod @classmethod
def _is_closing(cls, text: str) -> bool: def _is_closing(cls, text: str) -> bool:
@@ -235,17 +250,31 @@ class EndCallProcessor(FrameProcessor):
await super().process_frame(frame, direction) await super().process_frame(frame, direction)
if isinstance(frame, LLMTextFrame): if isinstance(frame, LLMTextFrame):
self._buf += frame.text self._buf += frame.text
self._assistant_seen += frame.text.lower()
if self._phone_marker and self._phone_marker in self._assistant_seen:
self._phone_confirmed = True # the agent read the number back itself
elif isinstance(frame, LLMFullResponseEndFrame): elif isinstance(frame, LLMFullResponseEndFrame):
if self._is_closing(self._buf): if self._is_closing(self._buf):
self._should_end = True booking = any(k in self._assistant_seen for k in self._BOOKING_KWS)
logger.info(f"{AGENT_NAME} signalled closing -- will hang up " if self._phone_confirmed or not booking:
f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking") self._should_end = True
logger.info(f"{AGENT_NAME} signalled closing -- will hang up "
f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking")
else:
# Booking call closing without the number confirmed — do it deterministically.
self._pending_phone_inject = True
logger.info(f"{AGENT_NAME} reached closing w/o phone confirmation -- injecting it")
self._buf = "" self._buf = ""
elif isinstance(frame, BotStoppedSpeakingFrame) and self._should_end: elif isinstance(frame, BotStoppedSpeakingFrame):
self._should_end = False if self._pending_phone_inject:
# Schedule the teardown so we don't block the pipeline during the grace pause. self._pending_phone_inject = False
if self._end_task is None: self._phone_confirmed = True
self._end_task = asyncio.create_task(self._hang_up_after_delay()) await self.push_frame(TTSSpeakFrame(self._phone_confirm_line), FrameDirection.DOWNSTREAM)
elif self._should_end:
self._should_end = False
# Schedule the teardown so we don't block the pipeline during the grace pause.
if self._end_task is None:
self._end_task = asyncio.create_task(self._hang_up_after_delay())
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
@@ -455,7 +484,18 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru
context_kwargs["tools"] = _build_tools() context_kwargs["tools"] = _build_tools()
context = LLMContext(**context_kwargs) context = LLMContext(**context_kwargs)
agg = LLMContextAggregatorPair(context) agg = LLMContextAggregatorPair(context)
endcall = EndCallProcessor() # Deterministic phone-confirmation safety net: if the agent reaches a closing without
# having read the caller-ID back, EndCallProcessor speaks this scripted line first.
if caller_number:
_spoken = _spoken_phone(caller_number)
phone_confirm_line = (
f"Before you go, let me make sure I have the best number to reach you: "
f"{_spoken}. Is that correct?"
)
phone_marker = _spoken.split(",")[0].strip() # e.g. "nine seven three"
else:
phone_confirm_line = phone_marker = None
endcall = EndCallProcessor(phone_confirm_line=phone_confirm_line, phone_marker=phone_marker)
pipeline = Pipeline( pipeline = Pipeline(
[ [