Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt

Historical calls showed the 8B re-asking for name/reason/phone it already had ("I already gave you my full name", the "I want an appointment" -> "what brings you in?" loop) and VAD splitting one utterance into consecutive user turns. - callstate.py: CallStateGroomer between agg.user() and the LLM. After each agent turn (off the critical path) it extracts collected slots via one short JSON-mode Ollama pass, then before each generation injects an ALREADY COLLECTED / STILL NEEDED checklist into the system message and merges VAD-fragmented consecutive user messages. Callback-type calls get an explicit "no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama, off for anthropic). - bot.py prompt step 1: "I want an appointment" is the booking intent, not the reason - ask the visit reason once, never twice. - scripts/ab_replay.py: regression harness replaying the real failed calls. llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate). Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-03 23:49:39 +00:00
parent bae388420b
commit a47f4b423c
5 changed files with 445 additions and 2 deletions
--- a/.env.example
+++ b/.env.example
@@ -68,3 +68,7 @@ VAD_CONFIDENCE=0.5
 VAD_MIN_VOLUME=0.15
 VAD_START_SECS=0.1
 VAD_STOP_SECS=0.5
+# Deterministic slot memory (callstate.py): injects an ALREADY-COLLECTED / STILL-NEEDED
+# checklist into the system prompt each turn + merges VAD-fragmented user turns, so the
+# local 8B stops re-asking for name/reason/phone. Default: on for ollama, off for anthropic.
+#CALL_STATE_TRACKING=true