Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt
Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.
- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
agent turn (off the critical path) it extracts collected slots via one short
JSON-mode Ollama pass, then before each generation injects an ALREADY
COLLECTED / STILL NEEDED checklist into the system message and merges
VAD-fragmented consecutive user messages. Callback-type calls get an explicit
"no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -68,3 +68,7 @@ VAD_CONFIDENCE=0.5
|
||||
VAD_MIN_VOLUME=0.15
|
||||
VAD_START_SECS=0.1
|
||||
VAD_STOP_SECS=0.5
|
||||
# Deterministic slot memory (callstate.py): injects an ALREADY-COLLECTED / STILL-NEEDED
|
||||
# checklist into the system prompt each turn + merges VAD-fragmented user turns, so the
|
||||
# local 8B stops re-asking for name/reason/phone. Default: on for ollama, off for anthropic.
|
||||
#CALL_STATE_TRACKING=true
|
||||
|
||||
Reference in New Issue
Block a user