Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt
Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.
- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
agent turn (off the critical path) it extracts collected slots via one short
JSON-mode Ollama pass, then before each generation injects an ALREADY
COLLECTED / STILL NEEDED checklist into the system message and merges
VAD-fragmented consecutive user messages. Callback-type calls get an explicit
"no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
24
bot.py
24
bot.py
@@ -53,6 +53,7 @@ from pipecat.transports.websocket.fastapi import (
|
||||
FastAPIWebsocketTransport,
|
||||
)
|
||||
|
||||
from callstate import CallStateGroomer
|
||||
from practice import practice_summary
|
||||
|
||||
# ── Config (env-overridable) ─────────────────────────────────────────────────
|
||||
@@ -120,6 +121,17 @@ ECHO_TAIL_SECS = float(os.environ.get("ECHO_TAIL_SECS", "0.25"))
|
||||
SILENCE_WATCHDOG = os.environ.get("SILENCE_WATCHDOG", "true").lower() not in ("false", "0", "no")
|
||||
SILENCE_REPROMPT_SECS = float(os.environ.get("SILENCE_REPROMPT_SECS", "7.0"))
|
||||
MAX_REPROMPTS = int(os.environ.get("MAX_REPROMPTS", "2"))
|
||||
# Deterministic slot-state tracking (callstate.py): after each agent turn, extract what the
|
||||
# caller already provided and inject an explicit ALREADY-COLLECTED / STILL-NEEDED checklist
|
||||
# into the system message, plus merge VAD-fragmented user turns. Fixes the 8B re-asking for
|
||||
# name/reason/phone it was already given. Extraction runs on the local Ollama model, so it
|
||||
# auto-disables for the anthropic provider (Claude tracks state fine on its own).
|
||||
_call_state_env = os.environ.get("CALL_STATE_TRACKING")
|
||||
CALL_STATE_TRACKING = (
|
||||
_call_state_env.lower() in ("1", "true", "yes")
|
||||
if _call_state_env is not None
|
||||
else (LLM_PROVIDER == "ollama")
|
||||
)
|
||||
# Record each call to a stereo WAV (caller = left, agent = right) for review/debugging.
|
||||
RECORD_CALLS = os.environ.get("RECORD_CALLS", "true").lower() not in ("false", "0", "no")
|
||||
RECORDINGS_DIR = os.environ.get("RECORDINGS_DIR", os.path.join(HERE, "recordings"))
|
||||
@@ -165,7 +177,11 @@ SYSTEM_PROMPT = (
|
||||
"THIS case — switch to taking a message; never force booking questions on a non-booking caller.\n"
|
||||
" • A BOOKING (they want to schedule a visit) — work through these steps in order:\n"
|
||||
" 1. REASON FIRST — find out what they are calling about (the reason for the visit, or "
|
||||
"their question). If it is only a question, answer it.\n"
|
||||
"their question). If it is only a question, answer it. NOTE: 'I want an appointment' / 'I "
|
||||
"need to make an appointment' is the booking INTENT, not the reason — never treat it as a "
|
||||
"non-answer. Acknowledge it and ask ONCE what the visit is for, e.g. 'Happy to help — what "
|
||||
"would you like to be seen for?'. If they just say 'an appointment' again or give no medical "
|
||||
"reason, note it as a general visit and MOVE ON to location — NEVER ask the reason twice.\n"
|
||||
" 2. LOCATION — ask which city or area is most convenient, then confirm the matching "
|
||||
"office (see the office rule below).\n"
|
||||
" 3. CALLER INFO — get their FULL name (first and last; if they give only a first name, "
|
||||
@@ -656,6 +672,11 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru
|
||||
context_kwargs["tools"] = _build_tools()
|
||||
context = LLMContext(**context_kwargs)
|
||||
agg = LLMContextAggregatorPair(context)
|
||||
# Deterministic slot memory: merges fragmented user turns + injects the live
|
||||
# collected/needed checklist into the system message before each generation.
|
||||
groomer = CallStateGroomer(
|
||||
context, base_system=system_content, ollama_url=OLLAMA_URL, model=OLLAMA_MODEL,
|
||||
) if CALL_STATE_TRACKING else None
|
||||
# Deterministic phone-confirmation safety net: if the agent reaches a closing without
|
||||
# having read the caller-ID back, EndCallProcessor speaks this scripted line first.
|
||||
if caller_number:
|
||||
@@ -687,6 +708,7 @@ async def run_agent(transport, caller_number=None, call_sid=None, do_capture=Tru
|
||||
vad,
|
||||
stt,
|
||||
agg.user(),
|
||||
*( [groomer] if groomer else [] ), # slot-state checklist + user-turn merge
|
||||
llm,
|
||||
endcall,
|
||||
*( [watchdog] if watchdog else [] ), # re-prompt on caller silence
|
||||
|
||||
Reference in New Issue
Block a user