avc-phone-ai

Author	SHA1	Message	Date
tocmo0nlord	a521dc168e	Fix GPU OOM: share one Whisper model across calls (was leaking per call) Calls were dropping right after answer with "CUDA failed with error out of memory". Cause: each call constructed a new HintedWhisperSTTService -> new ctranslate2 WhisperModel on the GPU, and that VRAM was never released when the call ended. Over ~13 calls the python process grew to 9.7GB; with the pinned LLM (6GB) the 16GB GPU filled (14 MiB free) and Whisper load failed on every call. Fix: cache one WhisperModel per (model,device,compute) in _WHISPER_MODEL_CACHE and reuse it across all calls; bake the fixed hotwords into the shared model's transcribe() once (drops the racy per-call monkey-patch). VRAM now constant (~6GB LLM + ~1.5GB Whisper). Verified: two instances share one model object; GPU back to 6.0/16GB used after restart. Documented the VRAM budget. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 22:07:59 +00:00
tocmo0nlord	1cfdf562e2	Phone confirmation: state number, invite correction only (no "yes") Call-recording analysis proved the repetitive post-phone silence is NOT volume: the caller's reply was a full-energy sound right after the (long ~13s) phone question, but VAD never registered it (no "user started speaking"), so the call waited until the caller repeated it. Depending on catching a "yes" after a long utterance is fragile (echo/gate timing). Fix: stop requiring a "yes". AVA now states the number and invites a correction only ("...; if that's not the best number, just let me know.") and flows on — the caller only speaks to correct it. Updated the prompt step, the caller-ID injection, and the deterministic EndCallProcessor line. Verified 4/4. Docs: phone step, recording + watchdog entries, recording's post-gate limitation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 17:57:54 +00:00
tocmo0nlord	80824a7ab0	Add call recording (stereo WAV) + wire silence re-prompt watchdog Stop debugging silence by guesswork: AudioBufferProcessor records every call to recordings/<ts>_<callsid>.wav (caller=left, agent=right) so calls can be reviewed with actual audio. (We had no audio before — that was the real gap; the earlier "too quiet" explanation was unsupported.) SilenceWatchdog: after the agent finishes, if the caller is silent for SILENCE_REPROMPT_SECS (7s) it re-prompts ("are you still there?"); after MAX_REPROMPTS it closes gracefully. This directly breaks the dead-silence pattern (e.g. the 14s gap after the phone confirmation) instead of waiting. Runtime-tested both. .gitignore already excludes recordings/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 17:46:07 +00:00
tocmo0nlord	b0df7fd5b0	Fix missed quiet "yes" after phone confirmation: more sensitive VAD After the phone confirmation a caller's "yes" wasn't picked up (silence) until they repeated it louder. Logs: line was live and the half-duplex gate had reopened, but VAD never fired for ~14s — the quick/quiet "yes" was below threshold (min_volume 0.3, start_secs 0.2). Now that HalfDuplexGate gates out the agent's echo while it speaks, VAD can be sensitive without echo false-triggers (it only listens hard on the caller's turn). Lowered min_volume 0.3->0.15, start_secs 0.2->0.1, and trimmed the echo tail 0.5->0.25 so an answer right after the agent stops isn't dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 17:36:20 +00:00
tocmo0nlord	32a3bb7136	Fix echo-induced silence with a half-duplex audio gate A caller's reply was generated but never heard: 0.65s after the agent started speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an interruption that cancelled the agent's audio -> ~24s of silence until the caller spoke again. Cause: the agent's own TTS echoes back the phone line and the always-on VAD interruption treats it as a barge-in. (PipelineParams has no allow_interruptions in this pipecat build — it was a silent no-op.) Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks (+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in. Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it. Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 16:44:00 +00:00
tocmo0nlord	ceea3d151c	Fix mid-call silence: keep momentum after acknowledgments A caller gave their insurance; AVA replied with a bare acknowledgment ("staff will verify your coverage") and stopped, with no follow-up question. Both sides then waited -> dead air (pipeline idle, no GPU/LLM activity, matching flat memory/wattage). Caller had to break the silence with "what questions do you have?". Root cause: the one-sentence brevity rule made AVA end a booking turn on a dead-end statement. Fix: prompt now requires, until the booking is complete, that every turn end with the next question — acknowledgment + next question in the same turn (e.g. insurance ack -> immediately ask day/time). Verified 4/4. Documented in CLAUDE.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 16:34:25 +00:00
tocmo0nlord	d7bfe2dbe8	Deterministic phone confirmation safety net + docs EndCallProcessor now guarantees the callback number is confirmed on booking calls: the 8B reads it back only ~half the time, so if a closing is reached on a booking call (booking keyword seen) without the agent having spoken the number (phone_marker absent from its replies), the hang-up is suppressed and a scripted confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first. The agent's own readback satisfies the gate (no double-ask); info-only calls are never asked for a number. Runtime-tested all four paths (inject / no-inject / info-only / inject-then-end). CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct phone-confirm phrasing, and the insurance "never say we accept" rule. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 15:52:22 +00:00
tocmo0nlord	1e0472e864	Never claim the appointment is confirmed; clean phone-confirm + insurance Fixes from a test call: - Contradiction: AVA said "staff will confirm" then later "we've got your appointment scheduled". Hardened the rule — never say booked/scheduled/set/ confirmed (even in the recap); it's always a REQUEST staff confirm on callback. Wrap-up recaps as "I've noted your request...". - Phone: it asked "may I read your number back?" then read it anyway. Now states it directly in one line ("I have your number as <number> - is that best?"), no permission ask, don't skip. - Insurance: stop saying "we accept/take <plan>" (it said "we accept All State", which isn't even a listed plan) — just note it, staff verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 15:33:27 +00:00
tocmo0nlord	8b52097713	Stop insurance hallucination: never suggest or guess the plan A caller trailed off ("My insurance plan is...") and AVA filled in "CarePlus", which got logged to the lead. Tightened the insurance rule: ask open-endedly, do NOT read out/suggest plan names from the accepted list, capture only what the caller says, never fill in/complete/guess the plan, and ask them to repeat if unclear. Verified 4/4 on the trail-off case (asks to repeat, no fabricated plan). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 04:13:50 +00:00
tocmo0nlord	9d65fa9aaa	Give AVA a clear ordered call workflow Replace loose "gather these details" with a directed script so the call has clear direction: 1. Reason first — what are they calling about 2. Location — city/area, confirm the matching office 3. Caller info — full name, then address them by name; insurance (log only), preferred day/time 4. Verify phone near the end by reading it back 5. Wrap up — recap, then "Is there anything else I can help you with?" Closing hardened: "Goodbye" (which ends the call) is gated behind the anything-else question, never said in the same turn as confirming details. Be warm but direct; one short turn at a time. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 03:44:03 +00:00
tocmo0nlord	703c902d0f	Fix phone readback, lead-with-number flow, and AVA pronunciation - Phone: inject the caller-ID into the prompt already spelled digit-by-digit so the model repeats clean words instead of mangling raw digits (it had emitted "197-three five seven three..." -> Kokoro read "one hundred ninety-seven"). - Flow: stop leading with the phone number. Prompt now flows naturally and saves the callback-number confirmation for the END; the caller-ID line says not to recite it early. Verified 3/3 openings no longer recite the number. - Name: Kokoro spelled all-caps "AVA" as "A-V-A". Respell to AGENT_NAME_SPOKEN (default "Ava") in TTS only; logs/Odoo keep AGENT_NAME. Override e.g. AGENT_NAME_SPOKEN=Eva for an "EE-vuh" sound. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 02:08:52 +00:00
tocmo0nlord	6010b136a7	Fix office selection: confirm the matching office, don't offer others When a caller named a city matching an office ("I'm in Kendall"), AVA confirmed Kendall then asked them to pick between unrelated offices ("Hollywood or Miami?"), going off script. Tightened the prompt: on a city that matches an office, confirm THAT office and move on; never offer/compare other offices or ask the caller to choose; name the nearest only if nothing matches. Verified 3/3 on the failing scenario. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 01:30:35 +00:00
tocmo0nlord	92abe209f3	Fix SpokenKokoroTTSService.run_tts signature (broke all call audio) run_tts is called as run_tts(self, text, context_id); the override only accepted (self, text), so every utterance raised "takes 2 positional arguments but 3 were given" and produced no audio — callers heard nothing on every call since the number-normalization change. Added context_id and pass it through. Verified the service now emits audio (118KB for a sample) with digits normalized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 01:13:45 +00:00
tocmo0nlord	1204d24340	Read phone numbers, street numbers, and zips digit-by-digit in TTS Kokoro spoke "983-4969" as "nine hundred eighty-three dash forty-nine sixty- nine". Added SpokenKokoroTTSService which normalizes text just before synthesis (run_tts gets the full sentence): US phone patterns and 4-5 digit runs (street numbers, zips) are spoken one digit at a time, country code dropped, no "dash"/ parens. Dates and times are left natural. Deterministic, so it's robust to whatever the model emits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 04:17:54 +00:00
tocmo0nlord	19728e1555	Fix bad-call regressions: drop in-call date computation, tighten replies A real call derailed: AVA argued about today's date, parroted the canned date example, hallucinated appointment availability, and rambled. Root cause was the date-validation feature — the local 8B model computes appointment dates wrong ~5/5 in testing, so having it state/correct dates is a liability. - DATES: capture & defer — AVA takes the day/time in the caller's own words, never computes/states/corrects the calendar date, never argues about today; staff confirm the exact date on callback. Removed the 45-day calendar injection and _date_context()/datetime use. - Hardened the no-availability rule (no "openings", no "check availability", no "I'll book"). - Brevity: one short sentence per reply (two at most). Post-call extractor still records a best-effort resolved date (staff-verified). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 03:20:22 +00:00
tocmo0nlord	b8c71b15c2	Capture full appointment details + validate dates in-call In-call (system prompt + per-call calendar injection): - Gather full name (prompt asks for last name if only first given). - Confirm the caller-ID number; if declined, use the number the caller gives. - Ask for and LOG insurance only — never promise/confirm/deny coverage or treatment based on it; staff verify on callback. - Validate the requested date against an injected 45-day calendar (recomputed per call since the server is long-running). Push back on impossible/mismatched dates, e.g. "Monday lands on the sixth — would you like that date?". - AGENT_NAME=AVA; 4s grace pause before hang-up (HANGUP_DELAY_SECS). Logging (post-call extraction -> Odoo): - Extract full name, phone_confirmed, chosen callback (caller-ID or alternate), insurance, reason, and preferred time annotated with a resolved YYYY-MM-DD date (today's date is fed to the extractor). - odoo_client: insurance row on the lead note (log only — staff verify). .gitignore: ignore rotated avc_run.log* files. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 03:00:35 +00:00
tocmo0nlord	5ed641255c	Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token Deepgram and the Twilio Standard API Key were reverted per decision: - bot.py: restore HintedWhisperSTTService (faster-whisper hotwords), default model medium; remove DeepgramSTTService import + DEEPGRAM_API_KEY. - server.py: restore TWILIO_AUTH_TOKEN for X-Twilio-Signature validation and the serializer auto-hang-up. Twilio signs webhooks with the Auth Token, so an API Key Secret cannot validate signatures. - .env.example: back to TWILIO_AUTH_TOKEN + Whisper STT vars. - .gitignore: ignore runtime *.log (avc_run.log). OLLAMA_MODEL stays activeblue-avc:latest (the existing pulled tag). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 01:06:24 +00:00
tocmo0nlord	c3c719b77e	Initial commit: avc-phone-ai codebase + CLAUDE.md	2026-06-23 22:38:22 +00:00

18 Commits