diff --git a/CLAUDE.md b/CLAUDE.md index 180989c..bfe1d32 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -40,6 +40,14 @@ Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finis clipped, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up` drops the carrier leg. Verified working in the Phase 1 gate (4/4 clean hang-ups). +It also **deterministically guarantees the callback number is confirmed** on booking calls: +the 8B reads the number back only ~half the time, so if a closing is reached on a booking +call (booking keyword seen) without the agent having spoken the number (`phone_marker` not +seen in its replies), the hang-up is suppressed and a scripted confirmation line +(`phone_confirm_line`, the caller-ID spelled out) is injected as a `TTSSpeakFrame` first. +The agent's own readback satisfies the gate, so there's no double-ask in the common case; +info-only calls (no booking keyword) are never asked for a number. + **Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`. `PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly. No custom audio module needed. @@ -250,18 +258,25 @@ time, leading the call rather than waiting on the caller. Fixed order: 2. **Location** — ask city/area, confirm the matching office (don't offer others — see office rule). 3. **Caller info** — full name (ask last name if only a first is given), then **address the caller by name** from there on; insurance (log only); preferred day/time in their words. -4. **Verify phone** — near the end, read the caller-ID back digit-by-digit and ask if it's best; - if not, use the number they give. Never raised earlier in the call. -5. **Wrap up** — recap the booking by name, then ask **"Is there anything else I can help you - with?"** +4. **Verify phone** — near the end, state the caller-ID back in one line ("I have your number + as — is that the best number?"), no asking permission first; if not, use the number + they give. Never raised earlier. **Backed by a deterministic safety net** — if the agent + skips it, `EndCallProcessor` injects the confirmation before hang-up (see "already solved"). +5. **Wrap up** — recap the booking **as a REQUEST** by name ("I've noted your request to come + in…"), make clear staff will call to confirm, then ask **"Is there anything else I can help + you with?"** + +**Never claims a booking:** AVA must never say an appointment is "booked / scheduled / set / +confirmed" — everything is a request staff confirm on callback. **Insurance:** never say "we +accept/take" a plan (or invent one) — just note what the caller said; staff verify. **Closing is gated:** the word "Goodbye" ends the call (triggers `EndCallProcessor` → hang-up), so it is never said in the same turn as confirming details and never before the anything-else question — only after the caller says they need nothing more. -> Reliability: this is prompt-driven on the local 8B, so order is followed well but not -> perfectly — the phone-readback step in particular varies (sometimes reads back, sometimes -> asks for the number), and it can re-ask a last name. Same model ceiling noted elsewhere. +> Reliability: the script is prompt-driven on the local 8B (order followed well, not perfectly; +> it can re-ask a last name). The phone-confirmation step is the exception — it's now +> **guaranteed** by the deterministic `EndCallProcessor` safety net. ## Call Data Capture @@ -277,7 +292,7 @@ Replies are kept to one short sentence. | Phone | Confirmed **near the end** (not led with); reads back the caller-ID — injected pre-spelled so it's said digit-by-digit — and if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) | | Office / city | Asks city/area; when the caller names a place that matches an office, **confirms that office and moves on** — never offers/compares other offices or asks them to choose; names the nearest only if nothing matches | folded into `reason` prefix | | Reason | Captured from the conversation | `reason` | -| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks them to repeat if unclear), never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") | +| Insurance | **Log only, never suggest or guess** — asks open-endedly (no plan names read out), captures only what the caller says, never fills in/completes/guesses the plan (asks to repeat if unclear), never says "we accept/take" a plan, never promises/confirms/denies coverage or treatment even for a listed plan; staff verify on callback | `insurance` (note: "log only — staff to verify") | | Preferred day & time | **Capture & defer** — taken in the caller's own words; AVA does not compute or correct the date | `preferred_time` + best-effort resolved `YYYY-MM-DD` | ### Dates — capture & defer (do NOT compute in-call) diff --git a/bot.py b/bot.py index 8e0c785..83a6f95 100644 --- a/bot.py +++ b/bot.py @@ -203,20 +203,35 @@ def _build_tools() -> ToolsSchema: class EndCallProcessor(FrameProcessor): - """Lets the agent hang up. MUST sit between the LLM and the TTS: there it sees her reply - text (LLMTextFrame, flowing downstream) AND the upstream copy of BotStoppedSpeakingFrame - the output transport emits. It accumulates each reply; if the finished reply contains a - closing ('goodbye'/'adiós'), it waits until she's done speaking, pauses HANGUP_DELAY_SECS - so the caller isn't clipped, then pushes EndTaskFrame upstream — the task ends and - TwilioFrameSerializer (auto_hang_up) drops the call.""" + """Lets the agent hang up AND guarantees the callback number is confirmed once. + + Sits between the LLM and the TTS: it sees reply text (LLMTextFrame, downstream) and the + upstream BotStoppedSpeakingFrame. On a closing ('goodbye'/'adiós') it waits for TTS to + finish, pauses HANGUP_DELAY_SECS so the caller isn't clipped, then pushes EndTaskFrame + (TwilioFrameSerializer auto_hang_up drops the call). + + Deterministic phone confirmation: the prompt asks the agent to read the callback number + back, but the 8B skips it ~half the time. So if a closing is reached and the agent never + spoke the number this call (`phone_marker` not seen in its replies), we suppress the + hang-up and inject a scripted confirmation turn first — guaranteeing it happens exactly + once (the agent's own readback satisfies the gate, so no double-ask in the common case).""" _CLOSINGS = ("goodbye", "good-bye", "good bye", "adiós", "adios", "hasta luego") + # Only force phone confirmation when a booking was actually underway (not info-only calls). + _BOOKING_KWS = ("appointment", "schedule", "book", "insurance", "what day", "what time", + "come in", "preferred") - def __init__(self): + def __init__(self, phone_confirm_line: str | None = None, phone_marker: str | None = None): super().__init__() self._buf = "" self._should_end = False self._end_task = None + self._phone_confirm_line = phone_confirm_line + self._phone_marker = (phone_marker or "").lower() + # Nothing to confirm (no caller ID) → treat as already handled. + self._phone_confirmed = not phone_confirm_line + self._assistant_seen = "" + self._pending_phone_inject = False @classmethod def _is_closing(cls, text: str) -> bool: @@ -235,17 +250,31 @@ class EndCallProcessor(FrameProcessor): await super().process_frame(frame, direction) if isinstance(frame, LLMTextFrame): self._buf += frame.text + self._assistant_seen += frame.text.lower() + if self._phone_marker and self._phone_marker in self._assistant_seen: + self._phone_confirmed = True # the agent read the number back itself elif isinstance(frame, LLMFullResponseEndFrame): if self._is_closing(self._buf): - self._should_end = True - logger.info(f"{AGENT_NAME} signalled closing -- will hang up " - f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking") + booking = any(k in self._assistant_seen for k in self._BOOKING_KWS) + if self._phone_confirmed or not booking: + self._should_end = True + logger.info(f"{AGENT_NAME} signalled closing -- will hang up " + f"{HANGUP_DELAY_SECS:.0f}s after she finishes speaking") + else: + # Booking call closing without the number confirmed — do it deterministically. + self._pending_phone_inject = True + logger.info(f"{AGENT_NAME} reached closing w/o phone confirmation -- injecting it") self._buf = "" - elif isinstance(frame, BotStoppedSpeakingFrame) and self._should_end: - self._should_end = False - # Schedule the teardown so we don't block the pipeline during the grace pause. - if self._end_task is None: - self._end_task = asyncio.create_task(self._hang_up_after_delay()) + elif isinstance(frame, BotStoppedSpeakingFrame): + if self._pending_phone_inject: + self._pending_phone_inject = False + self._phone_confirmed = True + await self.push_frame(TTSSpeakFrame(self._phone_confirm_line), FrameDirection.DOWNSTREAM) + elif self._should_end: + self._should_end = False + # Schedule the teardown so we don't block the pipeline during the grace pause. + if self._end_task is None: + self._end_task = asyncio.create_task(self._hang_up_after_delay()) await self.push_frame(frame, direction) @@ -455,7 +484,18 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru context_kwargs["tools"] = _build_tools() context = LLMContext(**context_kwargs) agg = LLMContextAggregatorPair(context) - endcall = EndCallProcessor() + # Deterministic phone-confirmation safety net: if the agent reaches a closing without + # having read the caller-ID back, EndCallProcessor speaks this scripted line first. + if caller_number: + _spoken = _spoken_phone(caller_number) + phone_confirm_line = ( + f"Before you go, let me make sure I have the best number to reach you: " + f"{_spoken}. Is that correct?" + ) + phone_marker = _spoken.split(",")[0].strip() # e.g. "nine seven three" + else: + phone_confirm_line = phone_marker = None + endcall = EndCallProcessor(phone_confirm_line=phone_confirm_line, phone_marker=phone_marker) pipeline = Pipeline( [