Commit Graph

18 Commits

Author SHA1 Message Date
tocmo0nlord
a521dc168e Fix GPU OOM: share one Whisper model across calls (was leaking per call)
Calls were dropping right after answer with "CUDA failed with error out of
memory". Cause: each call constructed a new HintedWhisperSTTService -> new
ctranslate2 WhisperModel on the GPU, and that VRAM was never released when the
call ended. Over ~13 calls the python process grew to 9.7GB; with the pinned LLM
(6GB) the 16GB GPU filled (14 MiB free) and Whisper load failed on every call.

Fix: cache one WhisperModel per (model,device,compute) in _WHISPER_MODEL_CACHE
and reuse it across all calls; bake the fixed hotwords into the shared model's
transcribe() once (drops the racy per-call monkey-patch). VRAM now constant
(~6GB LLM + ~1.5GB Whisper). Verified: two instances share one model object;
GPU back to 6.0/16GB used after restart. Documented the VRAM budget.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 22:07:59 +00:00
tocmo0nlord
1cfdf562e2 Phone confirmation: state number, invite correction only (no "yes")
Call-recording analysis proved the repetitive post-phone silence is NOT volume:
the caller's reply was a full-energy sound right after the (long ~13s) phone
question, but VAD never registered it (no "user started speaking"), so the call
waited until the caller repeated it. Depending on catching a "yes" after a long
utterance is fragile (echo/gate timing).

Fix: stop requiring a "yes". AVA now states the number and invites a correction
only ("...; if that's not the best number, just let me know.") and flows on —
the caller only speaks to correct it. Updated the prompt step, the caller-ID
injection, and the deterministic EndCallProcessor line. Verified 4/4.

Docs: phone step, recording + watchdog entries, recording's post-gate limitation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:57:54 +00:00
tocmo0nlord
80824a7ab0 Add call recording (stereo WAV) + wire silence re-prompt watchdog
Stop debugging silence by guesswork: AudioBufferProcessor records every call to
recordings/<ts>_<callsid>.wav (caller=left, agent=right) so calls can be reviewed
with actual audio. (We had no audio before — that was the real gap; the earlier
"too quiet" explanation was unsupported.)

SilenceWatchdog: after the agent finishes, if the caller is silent for
SILENCE_REPROMPT_SECS (7s) it re-prompts ("are you still there?"); after
MAX_REPROMPTS it closes gracefully. This directly breaks the dead-silence
pattern (e.g. the 14s gap after the phone confirmation) instead of waiting.
Runtime-tested both. .gitignore already excludes recordings/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:46:07 +00:00
tocmo0nlord
b0df7fd5b0 Fix missed quiet "yes" after phone confirmation: more sensitive VAD
After the phone confirmation a caller's "yes" wasn't picked up (silence) until
they repeated it louder. Logs: line was live and the half-duplex gate had
reopened, but VAD never fired for ~14s — the quick/quiet "yes" was below
threshold (min_volume 0.3, start_secs 0.2).

Now that HalfDuplexGate gates out the agent's echo while it speaks, VAD can be
sensitive without echo false-triggers (it only listens hard on the caller's
turn). Lowered min_volume 0.3->0.15, start_secs 0.2->0.1, and trimmed the echo
tail 0.5->0.25 so an answer right after the agent stops isn't dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:36:20 +00:00
tocmo0nlord
32a3bb7136 Fix echo-induced silence with a half-duplex audio gate
A caller's reply was generated but never heard: 0.65s after the agent started
speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an
interruption that cancelled the agent's audio -> ~24s of silence until the caller
spoke again. Cause: the agent's own TTS echoes back the phone line and the
always-on VAD interruption treats it as a barge-in. (PipelineParams has no
allow_interruptions in this pipecat build — it was a silent no-op.)

Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks
(+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in.
Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it.
Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 16:44:00 +00:00
tocmo0nlord
ceea3d151c Fix mid-call silence: keep momentum after acknowledgments
A caller gave their insurance; AVA replied with a bare acknowledgment ("staff
will verify your coverage") and stopped, with no follow-up question. Both sides
then waited -> dead air (pipeline idle, no GPU/LLM activity, matching flat
memory/wattage). Caller had to break the silence with "what questions do you
have?". Root cause: the one-sentence brevity rule made AVA end a booking turn on
a dead-end statement.

Fix: prompt now requires, until the booking is complete, that every turn end
with the next question — acknowledgment + next question in the same turn (e.g.
insurance ack -> immediately ask day/time). Verified 4/4. Documented in CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 16:34:25 +00:00
tocmo0nlord
d7bfe2dbe8 Deterministic phone confirmation safety net + docs
EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).

CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 15:52:22 +00:00
tocmo0nlord
1e0472e864 Never claim the appointment is confirmed; clean phone-confirm + insurance
Fixes from a test call:
- Contradiction: AVA said "staff will confirm" then later "we've got your
  appointment scheduled". Hardened the rule — never say booked/scheduled/set/
  confirmed (even in the recap); it's always a REQUEST staff confirm on callback.
  Wrap-up recaps as "I've noted your request...".
- Phone: it asked "may I read your number back?" then read it anyway. Now states
  it directly in one line ("I have your number as <number> - is that best?"),
  no permission ask, don't skip.
- Insurance: stop saying "we accept/take <plan>" (it said "we accept All State",
  which isn't even a listed plan) — just note it, staff verify.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 15:33:27 +00:00
tocmo0nlord
8b52097713 Stop insurance hallucination: never suggest or guess the plan
A caller trailed off ("My insurance plan is...") and AVA filled in "CarePlus",
which got logged to the lead. Tightened the insurance rule: ask open-endedly,
do NOT read out/suggest plan names from the accepted list, capture only what the
caller says, never fill in/complete/guess the plan, and ask them to repeat if
unclear. Verified 4/4 on the trail-off case (asks to repeat, no fabricated plan).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 04:13:50 +00:00
tocmo0nlord
9d65fa9aaa Give AVA a clear ordered call workflow
Replace loose "gather these details" with a directed script so the call has
clear direction:
  1. Reason first — what are they calling about
  2. Location — city/area, confirm the matching office
  3. Caller info — full name, then address them by name; insurance (log only),
     preferred day/time
  4. Verify phone near the end by reading it back
  5. Wrap up — recap, then "Is there anything else I can help you with?"

Closing hardened: "Goodbye" (which ends the call) is gated behind the
anything-else question, never said in the same turn as confirming details.
Be warm but direct; one short turn at a time.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 03:44:03 +00:00
tocmo0nlord
703c902d0f Fix phone readback, lead-with-number flow, and AVA pronunciation
- Phone: inject the caller-ID into the prompt already spelled digit-by-digit so
  the model repeats clean words instead of mangling raw digits (it had emitted
  "197-three five seven three..." -> Kokoro read "one hundred ninety-seven").
- Flow: stop leading with the phone number. Prompt now flows naturally and
  saves the callback-number confirmation for the END; the caller-ID line says
  not to recite it early. Verified 3/3 openings no longer recite the number.
- Name: Kokoro spelled all-caps "AVA" as "A-V-A". Respell to AGENT_NAME_SPOKEN
  (default "Ava") in TTS only; logs/Odoo keep AGENT_NAME. Override e.g.
  AGENT_NAME_SPOKEN=Eva for an "EE-vuh" sound.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 02:08:52 +00:00
tocmo0nlord
6010b136a7 Fix office selection: confirm the matching office, don't offer others
When a caller named a city matching an office ("I'm in Kendall"), AVA confirmed
Kendall then asked them to pick between unrelated offices ("Hollywood or
Miami?"), going off script. Tightened the prompt: on a city that matches an
office, confirm THAT office and move on; never offer/compare other offices or
ask the caller to choose; name the nearest only if nothing matches. Verified
3/3 on the failing scenario.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 01:30:35 +00:00
tocmo0nlord
92abe209f3 Fix SpokenKokoroTTSService.run_tts signature (broke all call audio)
run_tts is called as run_tts(self, text, context_id); the override only accepted
(self, text), so every utterance raised "takes 2 positional arguments but 3
were given" and produced no audio — callers heard nothing on every call since
the number-normalization change. Added context_id and pass it through. Verified
the service now emits audio (118KB for a sample) with digits normalized.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 01:13:45 +00:00
tocmo0nlord
1204d24340 Read phone numbers, street numbers, and zips digit-by-digit in TTS
Kokoro spoke "983-4969" as "nine hundred eighty-three dash forty-nine sixty-
nine". Added SpokenKokoroTTSService which normalizes text just before synthesis
(run_tts gets the full sentence): US phone patterns and 4-5 digit runs (street
numbers, zips) are spoken one digit at a time, country code dropped, no "dash"/
parens. Dates and times are left natural. Deterministic, so it's robust to
whatever the model emits.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 04:17:54 +00:00
tocmo0nlord
19728e1555 Fix bad-call regressions: drop in-call date computation, tighten replies
A real call derailed: AVA argued about today's date, parroted the canned date
example, hallucinated appointment availability, and rambled. Root cause was the
date-validation feature — the local 8B model computes appointment dates wrong
~5/5 in testing, so having it state/correct dates is a liability.

- DATES: capture & defer — AVA takes the day/time in the caller's own words,
  never computes/states/corrects the calendar date, never argues about today;
  staff confirm the exact date on callback. Removed the 45-day calendar
  injection and _date_context()/datetime use.
- Hardened the no-availability rule (no "openings", no "check availability",
  no "I'll book").
- Brevity: one short sentence per reply (two at most).

Post-call extractor still records a best-effort resolved date (staff-verified).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:20:22 +00:00
tocmo0nlord
b8c71b15c2 Capture full appointment details + validate dates in-call
In-call (system prompt + per-call calendar injection):
- Gather full name (prompt asks for last name if only first given).
- Confirm the caller-ID number; if declined, use the number the caller gives.
- Ask for and LOG insurance only — never promise/confirm/deny coverage or
  treatment based on it; staff verify on callback.
- Validate the requested date against an injected 45-day calendar (recomputed
  per call since the server is long-running). Push back on impossible/mismatched
  dates, e.g. "Monday lands on the sixth — would you like that date?".
- AGENT_NAME=AVA; 4s grace pause before hang-up (HANGUP_DELAY_SECS).

Logging (post-call extraction -> Odoo):
- Extract full name, phone_confirmed, chosen callback (caller-ID or alternate),
  insurance, reason, and preferred time annotated with a resolved YYYY-MM-DD
  date (today's date is fed to the extractor).
- odoo_client: insurance row on the lead note (log only — staff verify).

.gitignore: ignore rotated avc_run.log* files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:00:35 +00:00
tocmo0nlord
5ed641255c Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token
Deepgram and the Twilio Standard API Key were reverted per decision:
- bot.py: restore HintedWhisperSTTService (faster-whisper hotwords), default
  model medium; remove DeepgramSTTService import + DEEPGRAM_API_KEY.
- server.py: restore TWILIO_AUTH_TOKEN for X-Twilio-Signature validation and
  the serializer auto-hang-up. Twilio signs webhooks with the Auth Token, so
  an API Key Secret cannot validate signatures.
- .env.example: back to TWILIO_AUTH_TOKEN + Whisper STT vars.
- .gitignore: ignore runtime *.log (avc_run.log).

OLLAMA_MODEL stays activeblue-avc:latest (the existing pulled tag).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 01:06:24 +00:00
tocmo0nlord
c3c719b77e Initial commit: avc-phone-ai codebase + CLAUDE.md 2026-06-23 22:38:22 +00:00