Calls were dropping right after answer with "CUDA failed with error out of
memory". Cause: each call constructed a new HintedWhisperSTTService -> new
ctranslate2 WhisperModel on the GPU, and that VRAM was never released when the
call ended. Over ~13 calls the python process grew to 9.7GB; with the pinned LLM
(6GB) the 16GB GPU filled (14 MiB free) and Whisper load failed on every call.
Fix: cache one WhisperModel per (model,device,compute) in _WHISPER_MODEL_CACHE
and reuse it across all calls; bake the fixed hotwords into the shared model's
transcribe() once (drops the racy per-call monkey-patch). VRAM now constant
(~6GB LLM + ~1.5GB Whisper). Verified: two instances share one model object;
GPU back to 6.0/16GB used after restart. Documented the VRAM budget.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Call-recording analysis proved the repetitive post-phone silence is NOT volume:
the caller's reply was a full-energy sound right after the (long ~13s) phone
question, but VAD never registered it (no "user started speaking"), so the call
waited until the caller repeated it. Depending on catching a "yes" after a long
utterance is fragile (echo/gate timing).
Fix: stop requiring a "yes". AVA now states the number and invites a correction
only ("...; if that's not the best number, just let me know.") and flows on —
the caller only speaks to correct it. Updated the prompt step, the caller-ID
injection, and the deterministic EndCallProcessor line. Verified 4/4.
Docs: phone step, recording + watchdog entries, recording's post-gate limitation.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stop debugging silence by guesswork: AudioBufferProcessor records every call to
recordings/<ts>_<callsid>.wav (caller=left, agent=right) so calls can be reviewed
with actual audio. (We had no audio before — that was the real gap; the earlier
"too quiet" explanation was unsupported.)
SilenceWatchdog: after the agent finishes, if the caller is silent for
SILENCE_REPROMPT_SECS (7s) it re-prompts ("are you still there?"); after
MAX_REPROMPTS it closes gracefully. This directly breaks the dead-silence
pattern (e.g. the 14s gap after the phone confirmation) instead of waiting.
Runtime-tested both. .gitignore already excludes recordings/.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After the phone confirmation a caller's "yes" wasn't picked up (silence) until
they repeated it louder. Logs: line was live and the half-duplex gate had
reopened, but VAD never fired for ~14s — the quick/quiet "yes" was below
threshold (min_volume 0.3, start_secs 0.2).
Now that HalfDuplexGate gates out the agent's echo while it speaks, VAD can be
sensitive without echo false-triggers (it only listens hard on the caller's
turn). Lowered min_volume 0.3->0.15, start_secs 0.2->0.1, and trimmed the echo
tail 0.5->0.25 so an answer right after the agent stops isn't dropped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller's reply was generated but never heard: 0.65s after the agent started
speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an
interruption that cancelled the agent's audio -> ~24s of silence until the caller
spoke again. Cause: the agent's own TTS echoes back the phone line and the
always-on VAD interruption treats it as a barge-in. (PipelineParams has no
allow_interruptions in this pipecat build — it was a silent no-op.)
Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks
(+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in.
Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it.
Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller gave their insurance; AVA replied with a bare acknowledgment ("staff
will verify your coverage") and stopped, with no follow-up question. Both sides
then waited -> dead air (pipeline idle, no GPU/LLM activity, matching flat
memory/wattage). Caller had to break the silence with "what questions do you
have?". Root cause: the one-sentence brevity rule made AVA end a booking turn on
a dead-end statement.
Fix: prompt now requires, until the booking is complete, that every turn end
with the next question — acknowledgment + next question in the same turn (e.g.
insurance ack -> immediately ask day/time). Verified 4/4. Documented in CLAUDE.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).
CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes from a test call:
- Contradiction: AVA said "staff will confirm" then later "we've got your
appointment scheduled". Hardened the rule — never say booked/scheduled/set/
confirmed (even in the recap); it's always a REQUEST staff confirm on callback.
Wrap-up recaps as "I've noted your request...".
- Phone: it asked "may I read your number back?" then read it anyway. Now states
it directly in one line ("I have your number as <number> - is that best?"),
no permission ask, don't skip.
- Insurance: stop saying "we accept/take <plan>" (it said "we accept All State",
which isn't even a listed plan) — just note it, staff verify.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller trailed off ("My insurance plan is...") and AVA filled in "CarePlus",
which got logged to the lead. Tightened the insurance rule: ask open-endedly,
do NOT read out/suggest plan names from the accepted list, capture only what the
caller says, never fill in/complete/guess the plan, and ask them to repeat if
unclear. Verified 4/4 on the trail-off case (asks to repeat, no fabricated plan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace loose "gather these details" with a directed script so the call has
clear direction:
1. Reason first — what are they calling about
2. Location — city/area, confirm the matching office
3. Caller info — full name, then address them by name; insurance (log only),
preferred day/time
4. Verify phone near the end by reading it back
5. Wrap up — recap, then "Is there anything else I can help you with?"
Closing hardened: "Goodbye" (which ends the call) is gated behind the
anything-else question, never said in the same turn as confirming details.
Be warm but direct; one short turn at a time.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Phone: inject the caller-ID into the prompt already spelled digit-by-digit so
the model repeats clean words instead of mangling raw digits (it had emitted
"197-three five seven three..." -> Kokoro read "one hundred ninety-seven").
- Flow: stop leading with the phone number. Prompt now flows naturally and
saves the callback-number confirmation for the END; the caller-ID line says
not to recite it early. Verified 3/3 openings no longer recite the number.
- Name: Kokoro spelled all-caps "AVA" as "A-V-A". Respell to AGENT_NAME_SPOKEN
(default "Ava") in TTS only; logs/Odoo keep AGENT_NAME. Override e.g.
AGENT_NAME_SPOKEN=Eva for an "EE-vuh" sound.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a caller named a city matching an office ("I'm in Kendall"), AVA confirmed
Kendall then asked them to pick between unrelated offices ("Hollywood or
Miami?"), going off script. Tightened the prompt: on a city that matches an
office, confirm THAT office and move on; never offer/compare other offices or
ask the caller to choose; name the nearest only if nothing matches. Verified
3/3 on the failing scenario.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
run_tts is called as run_tts(self, text, context_id); the override only accepted
(self, text), so every utterance raised "takes 2 positional arguments but 3
were given" and produced no audio — callers heard nothing on every call since
the number-normalization change. Added context_id and pass it through. Verified
the service now emits audio (118KB for a sample) with digits normalized.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Kokoro spoke "983-4969" as "nine hundred eighty-three dash forty-nine sixty-
nine". Added SpokenKokoroTTSService which normalizes text just before synthesis
(run_tts gets the full sentence): US phone patterns and 4-5 digit runs (street
numbers, zips) are spoken one digit at a time, country code dropped, no "dash"/
parens. Dates and times are left natural. Deterministic, so it's robust to
whatever the model emits.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A real call derailed: AVA argued about today's date, parroted the canned date
example, hallucinated appointment availability, and rambled. Root cause was the
date-validation feature — the local 8B model computes appointment dates wrong
~5/5 in testing, so having it state/correct dates is a liability.
- DATES: capture & defer — AVA takes the day/time in the caller's own words,
never computes/states/corrects the calendar date, never argues about today;
staff confirm the exact date on callback. Removed the 45-day calendar
injection and _date_context()/datetime use.
- Hardened the no-availability rule (no "openings", no "check availability",
no "I'll book").
- Brevity: one short sentence per reply (two at most).
Post-call extractor still records a best-effort resolved date (staff-verified).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In-call (system prompt + per-call calendar injection):
- Gather full name (prompt asks for last name if only first given).
- Confirm the caller-ID number; if declined, use the number the caller gives.
- Ask for and LOG insurance only — never promise/confirm/deny coverage or
treatment based on it; staff verify on callback.
- Validate the requested date against an injected 45-day calendar (recomputed
per call since the server is long-running). Push back on impossible/mismatched
dates, e.g. "Monday lands on the sixth — would you like that date?".
- AGENT_NAME=AVA; 4s grace pause before hang-up (HANGUP_DELAY_SECS).
Logging (post-call extraction -> Odoo):
- Extract full name, phone_confirmed, chosen callback (caller-ID or alternate),
insurance, reason, and preferred time annotated with a resolved YYYY-MM-DD
date (today's date is fed to the extractor).
- odoo_client: insurance row on the lead note (log only — staff verify).
.gitignore: ignore rotated avc_run.log* files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deepgram and the Twilio Standard API Key were reverted per decision:
- bot.py: restore HintedWhisperSTTService (faster-whisper hotwords), default
model medium; remove DeepgramSTTService import + DEEPGRAM_API_KEY.
- server.py: restore TWILIO_AUTH_TOKEN for X-Twilio-Signature validation and
the serializer auto-hang-up. Twilio signs webhooks with the Auth Token, so
an API Key Secret cannot validate signatures.
- .env.example: back to TWILIO_AUTH_TOKEN + Whisper STT vars.
- .gitignore: ignore runtime *.log (avc_run.log).
OLLAMA_MODEL stays activeblue-avc:latest (the existing pulled tag).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>