- PLEASANTRIES: the 8B parroted the verbatim example ("I'm doing well, thank
you for asking") when the caller never asked how she was, then burned two
more turns "starting fresh". Rule is now strictly conditional with no canned
example: answer+ask-back only if the caller literally asks; never answer a
question that wasn't asked.
- callstate: extraction now captures the CALLBACK request note ("are my
glasses ready" -> "status of an order"), so the checklist stops the "what's
the reason for your call?" re-ask; callback wrap-up wording now says STATE
the caller-ID number, never ask for one (she asked "what's the best phone
number" despite having it); first-name-only callbacks still ask the last name.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.
- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
agent turn (off the critical path) it extracts collected slots via one short
JSON-mode Ollama pass, then before each generation injects an ALREADY
COLLECTED / STILL NEEDED checklist into the system message and merges
VAD-fragmented consecutive user messages. Callback-type calls get an explicit
"no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>