avc-phone-ai

tocmo0nlord/avc-phone-ai

Fork 0

Commit Graph

Author	SHA1	Message	Date
tocmo0nlord	54d707ceac	Fix unasked pleasantries + callback re-asks (live call 2026-07-04 #3 ) - PLEASANTRIES: the 8B parroted the verbatim example ("I'm doing well, thank you for asking") when the caller never asked how she was, then burned two more turns "starting fresh". Rule is now strictly conditional with no canned example: answer+ask-back only if the caller literally asks; never answer a question that wasn't asked. - callstate: extraction now captures the CALLBACK request note ("are my glasses ready" -> "status of an order"), so the checklist stops the "what's the reason for your call?" re-ask; callback wrap-up wording now says STATE the caller-ID number, never ask for one (she asked "what's the best phone number" despite having it); first-name-only callbacks still ask the last name. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-04 03:58:15 +00:00
tocmo0nlord	a47f4b423c	Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt Historical calls showed the 8B re-asking for name/reason/phone it already had ("I already gave you my full name", the "I want an appointment" -> "what brings you in?" loop) and VAD splitting one utterance into consecutive user turns. - callstate.py: CallStateGroomer between agg.user() and the LLM. After each agent turn (off the critical path) it extracts collected slots via one short JSON-mode Ollama pass, then before each generation injects an ALREADY COLLECTED / STILL NEEDED checklist into the system message and merges VAD-fragmented consecutive user messages. Callback-type calls get an explicit "no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama, off for anthropic). - bot.py prompt step 1: "I want an appointment" is the booking intent, not the reason - ask the visit reason once, never twice. - scripts/ab_replay.py: regression harness replaying the real failed calls. llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate). Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-03 23:49:39 +00:00

Author

SHA1

Message

Date

tocmo0nlord

54d707ceac

Fix unasked pleasantries + callback re-asks (live call 2026-07-04 #3 )

- PLEASANTRIES: the 8B parroted the verbatim example ("I'm doing well, thank
  you for asking") when the caller never asked how she was, then burned two
  more turns "starting fresh". Rule is now strictly conditional with no canned
  example: answer+ask-back only if the caller literally asks; never answer a
  question that wasn't asked.
- callstate: extraction now captures the CALLBACK request note ("are my
  glasses ready" -> "status of an order"), so the checklist stops the "what's
  the reason for your call?" re-ask; callback wrap-up wording now says STATE
  the caller-ID number, never ask for one (she asked "what's the best phone
  number" despite having it); first-name-only callbacks still ask the last name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-07-04 03:58:15 +00:00

tocmo0nlord

a47f4b423c

Fix re-asking: deterministic slot memory + user-turn merge + reason-loop prompt

Historical calls showed the 8B re-asking for name/reason/phone it already had
("I already gave you my full name", the "I want an appointment" -> "what brings
you in?" loop) and VAD splitting one utterance into consecutive user turns.

- callstate.py: CallStateGroomer between agg.user() and the LLM. After each
  agent turn (off the critical path) it extracts collected slots via one short
  JSON-mode Ollama pass, then before each generation injects an ALREADY
  COLLECTED / STILL NEEDED checklist into the system message and merges
  VAD-fragmented consecutive user messages. Callback-type calls get an explicit
  "no booking questions" line. CALL_STATE_TRACKING env (auto: on for ollama,
  off for anthropic).
- bot.py prompt step 1: "I want an appointment" is the booking intent, not the
  reason - ask the visit reason once, never twice.
- scripts/ab_replay.py: regression harness replaying the real failed calls.
  llama3.1-8b raw = 3 failures; with CALL STATE = 0 failures across all
  scenarios (chat latency 0.31s -> 0.55s median, well under the 3s gate).
  Qwen3-14B A/B'd and rejected: no better raw, ~3s/turn, 11GB VRAM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-07-03 23:49:39 +00:00

2 Commits