Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt
Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.
- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
agent turn (off the critical path) it extracts collected slots via one short
JSON-mode Ollama pass, then before each generation injects an ALREADY
COLLECTED / STILL NEEDED checklist into the system message and merges
VAD-fragmented consecutive user messages. Callback-type calls get an explicit
"no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
22
CLAUDE.md
22
CLAUDE.md
@@ -83,7 +83,23 @@ audio while the bot is speaking (+`ECHO_TAIL_SECS`, default 0.5s) so echo never
|
||||
Trade-off: half-duplex — the caller can't barge in mid-utterance (fine for short replies).
|
||||
`HALF_DUPLEX=false` restores barge-in. Keep it on for telephony.
|
||||
|
||||
**Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends.
|
||||
**`CallStateGroomer` (`callstate.py`) — deterministic slot memory (2026-07-03).** Fixes the
|
||||
8B re-asking for things the caller already gave (name, reason, phone — seen repeatedly in the
|
||||
historical call logs: "Didn't you say you had my phone number?", "I already gave you my full
|
||||
name", the "I want an appointment"→"what brings you in?" loop). Sits between `agg.user()` and
|
||||
the LLM. Two jobs: (1) on upstream `BotStoppedSpeakingFrame` (agent finished; Ollama idle,
|
||||
caller talking) it runs a ~1.2s JSON-mode extraction over the transcript-so-far — OFF the
|
||||
latency-critical path, result applied next turn; (2) on downstream `LLMContextFrame` (right
|
||||
before generation) it synchronously merges VAD-fragmented consecutive user messages
|
||||
("Monday" / "3 p.m." → one turn) and injects an explicit checklist into the system message:
|
||||
`CALL STATE ... ALREADY COLLECTED (NEVER ask again): name=Carlos Garcia ... STILL NEEDED:
|
||||
insurance, preferred day/time`. It also carries call type (`callback` → "do NOT ask booking
|
||||
questions"). Verified via `scripts/ab_replay.py` (replays the real failed calls): llama3.1-8B
|
||||
raw = 3 failures, +CALL STATE = **0 failures**, chat latency 0.31s→0.55s med (system-message
|
||||
churn re-evals the prompt; acceptable, still ≪ the 3s gate). Env: `CALL_STATE_TRACKING`
|
||||
(default: on for ollama, off for anthropic — Claude tracks state fine on its own; extraction
|
||||
always runs on the local Ollama model). Qwen3-14B was A/B'd as an alternative and rejected
|
||||
for now: no better raw, ~3s/turn with state, needs `think:false` handling, ~11GB VRAM.
|
||||
Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model
|
||||
output, falls back to JSONL if Odoo is unreachable. Keep it.
|
||||
**Classifies `request_type`:** `appointment` (booking), `callback` (a non-booking request staff
|
||||
@@ -495,6 +511,10 @@ Beyond the three reverted changes, the following hardening is live (see git hist
|
||||
- **Reason capture** — post-call extractor broadened to capture the eye problem/symptom as the reason (not just visit types); reason now shown in the log line and the Odoo lead title.
|
||||
- **Hang-up** — `HANGUP_DELAY_SECS=4` grace pause before dropping the carrier leg.
|
||||
- **Office selection** — confirm the matching office; never offer/compare others.
|
||||
- **Re-ask fix (2026-07-03)** — `CallStateGroomer` slot-state checklist + user-turn merge (see
|
||||
component note above); prompt step 1 now says "I want an appointment" is intent not reason —
|
||||
ask the visit reason ONCE, then move on. Regression harness: `scripts/ab_replay.py [--state]
|
||||
<models...>` replays the historical failure scenarios and flags re-asks.
|
||||
|
||||
### Phase 2 — Accuracy (RAG + validation)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user