Commit Graph

2 Commits

Author SHA1 Message Date
tocmo0nlord
54d707ceac Fix unasked pleasantries + callback re-asks (live call 2026-07-04 #3)
- PLEASANTRIES: the 8B parroted the verbatim example ("I'm doing well, thank
  you for asking") when the caller never asked how she was, then burned two
  more turns "starting fresh". Rule is now strictly conditional with no canned
  example: answer+ask-back only if the caller literally asks; never answer a
  question that wasn't asked.
- callstate: extraction now captures the CALLBACK request note ("are my
  glasses ready" -> "status of an order"), so the checklist stops the "what's
  the reason for your call?" re-ask; callback wrap-up wording now says STATE
  the caller-ID number, never ask for one (she asked "what's the best phone
  number" despite having it); first-name-only callbacks still ask the last name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-04 03:58:15 +00:00
tocmo0nlord
a47f4b423c Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt
Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.

- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
  agent turn (off the critical path) it extracts collected slots via one short
  JSON-mode Ollama pass, then before each generation injects an ALREADY
  COLLECTED / STILL NEEDED checklist into the system message and merges
  VAD-fragmented consecutive user messages. Callback-type calls get an explicit
  "no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
  off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
  reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
  llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
  scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
  Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-03 23:49:39 +00:00