Commit Graph

20 Commits

Author SHA1 Message Date
tocmo0nlord
a521dc168e Fix GPU OOM: share one Whisper model across calls (was leaking per call)
Calls were dropping right after answer with "CUDA failed with error out of
memory". Cause: each call constructed a new HintedWhisperSTTService -> new
ctranslate2 WhisperModel on the GPU, and that VRAM was never released when the
call ended. Over ~13 calls the python process grew to 9.7GB; with the pinned LLM
(6GB) the 16GB GPU filled (14 MiB free) and Whisper load failed on every call.

Fix: cache one WhisperModel per (model,device,compute) in _WHISPER_MODEL_CACHE
and reuse it across all calls; bake the fixed hotwords into the shared model's
transcribe() once (drops the racy per-call monkey-patch). VRAM now constant
(~6GB LLM + ~1.5GB Whisper). Verified: two instances share one model object;
GPU back to 6.0/16GB used after restart. Documented the VRAM budget.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 22:07:59 +00:00
tocmo0nlord
ab15023651 Verify capacity gating; add call scorer for the 10-call gate run
Capacity gating verified deterministically: atomic _reserve_call_slot grants
exactly MAX_CONCURRENT_CALLS (2), refuses the 3rd, frees on hangup, and 10
simultaneous attempts grant only 2 (no race); /voice returns BUSY + Hangup at
cap. Marked the gate item done (end-to-end 3-phone test optional).

Add scripts/score_calls.py: grades recent calls from the server log against the
Phase 1 gate (turns, latency LLM->TTS, AVC-side hangup, leads, watchdog
re-prompts, errors) — for scoring the 10-call run once placed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 18:31:49 +00:00
tocmo0nlord
1cfdf562e2 Phone confirmation: state number, invite correction only (no "yes")
Call-recording analysis proved the repetitive post-phone silence is NOT volume:
the caller's reply was a full-energy sound right after the (long ~13s) phone
question, but VAD never registered it (no "user started speaking"), so the call
waited until the caller repeated it. Depending on catching a "yes" after a long
utterance is fragile (echo/gate timing).

Fix: stop requiring a "yes". AVA now states the number and invites a correction
only ("...; if that's not the best number, just let me know.") and flows on —
the caller only speaks to correct it. Updated the prompt step, the caller-ID
injection, and the deterministic EndCallProcessor line. Verified 4/4.

Docs: phone step, recording + watchdog entries, recording's post-gate limitation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:57:54 +00:00
tocmo0nlord
b0df7fd5b0 Fix missed quiet "yes" after phone confirmation: more sensitive VAD
After the phone confirmation a caller's "yes" wasn't picked up (silence) until
they repeated it louder. Logs: line was live and the half-duplex gate had
reopened, but VAD never fired for ~14s — the quick/quiet "yes" was below
threshold (min_volume 0.3, start_secs 0.2).

Now that HalfDuplexGate gates out the agent's echo while it speaks, VAD can be
sensitive without echo false-triggers (it only listens hard on the caller's
turn). Lowered min_volume 0.3->0.15, start_secs 0.2->0.1, and trimmed the echo
tail 0.5->0.25 so an answer right after the agent stops isn't dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:36:20 +00:00
tocmo0nlord
32a3bb7136 Fix echo-induced silence with a half-duplex audio gate
A caller's reply was generated but never heard: 0.65s after the agent started
speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an
interruption that cancelled the agent's audio -> ~24s of silence until the caller
spoke again. Cause: the agent's own TTS echoes back the phone line and the
always-on VAD interruption treats it as a barge-in. (PipelineParams has no
allow_interruptions in this pipecat build — it was a silent no-op.)

Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks
(+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in.
Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it.
Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 16:44:00 +00:00
tocmo0nlord
ceea3d151c Fix mid-call silence: keep momentum after acknowledgments
A caller gave their insurance; AVA replied with a bare acknowledgment ("staff
will verify your coverage") and stopped, with no follow-up question. Both sides
then waited -> dead air (pipeline idle, no GPU/LLM activity, matching flat
memory/wattage). Caller had to break the silence with "what questions do you
have?". Root cause: the one-sentence brevity rule made AVA end a booking turn on
a dead-end statement.

Fix: prompt now requires, until the booking is complete, that every turn end
with the next question — acknowledgment + next question in the same turn (e.g.
insurance ack -> immediately ask day/time). Verified 4/4. Documented in CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 16:34:25 +00:00
tocmo0nlord
d7bfe2dbe8 Deterministic phone confirmation safety net + docs
EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).

CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 15:52:22 +00:00
tocmo0nlord
856f9c284d Docs: Phase 1 change log + gate status
Document all post-revert Phase 1 changes (Whisper base->medium, lifespan LLM
warmup/pin keep_alive=-1, num_ctx 8192, call workflow, TTS digit/name spelling,
capture-and-defer dates, insurance never-suggest/guess, broadened symptom reason
capture, hang-up grace, office selection). Mark gate items: AVC-side termination,
AudioHeartbeat, zombie-free, JSON visibility = done; capacity gating, 10-call
consecutive run, and latency re-measure = still need live testing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 14:41:01 +00:00
tocmo0nlord
ba36ae6891 Log/surface the reason, pin LLM warm for latency, doc insurance rule
- Reason visibility: the reason WAS extracted ("disintegrated eyes") but only
  lived in the Odoo description note. Add it to the post-call log line and to
  the Odoo lead title so it's visible at a glance.
- Latency: split the timing — Whisper is ~0.1s, latency is LLM-side. The ~3s
  tail was cold model reloads after Ollama's keep-alive expired. server.py now
  warms + pins the model on startup (keep_alive=-1, ollama ps UNTIL=Forever),
  removing cold first-turn stalls. Whisper size left alone (not the bottleneck).
- CLAUDE.md: insurance rule (never suggest/guess the plan), latency note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 04:24:10 +00:00
tocmo0nlord
667cf87202 Docs: add Call Workflow section (ordered call script)
Document AVA's directed call script — reason first, location, caller info
(address by name), verify phone by readback near the end, wrap up with "anything
else?" — and the gated closing (Goodbye only after the anything-else question).
Note the 8B reliability ceiling on step ordering.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 03:44:31 +00:00
tocmo0nlord
0605025113 Set AGENT_NAME_SPOKEN=Eva (example) and document name/phone behaviour
- .env.example: add AGENT_NAME_SPOKEN=Eva.
- CLAUDE.md: note the agent-name respelling (AVA -> Eva, "EE-vuh"), that the
  caller-ID is injected pre-spelled (model mangles raw digits), and that the
  phone is confirmed near the END of the call, not led with.

(.env itself is gitignored; AGENT_NAME_SPOKEN=Eva set there and live.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 02:33:08 +00:00
tocmo0nlord
24d4efd7ed Docs: office-selection rule (confirm match, don't offer others)
Note in the Call Data Capture table that AVA confirms a matching office and
moves on rather than offering/comparing other offices — the fix for the
"I'm in Kendall" -> "Hollywood or Miami?" off-script behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 01:55:56 +00:00
tocmo0nlord
46409bd51a Docs: SpokenKokoroTTSService number normalization
Document the TTS number-reading fix in the "already solved" section: phone
numbers, street numbers, and zips are spoken digit-by-digit (no "dash"/parens,
country code dropped); dates/times left natural. tts_normalize() holds the rules.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 04:20:43 +00:00
tocmo0nlord
b31f685d91 Raise model num_ctx to 8192 to fix mid-call silence
Long calls overflowed the 4096-token window mid-conversation, forcing Ollama to
truncate + re-evaluate the full context each turn = multi-second stalls / dead
air. Rebuilt activeblue-avc:latest with num_ctx 8192 (rollback tag
activeblue-avc:pre-ctx8k). Combined with removing the 45-day calendar injection,
this keeps long calls well under the window. Doc: context row, Modelfile
reference, and a root-cause note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:53:41 +00:00
tocmo0nlord
08d9db4f09 Docs: dates are capture-and-defer (in-call computation removed)
Update the call-capture section to reflect the fix — AVA takes the day/time in
the caller's words and defers exact-date confirmation to staff; the 45-day
calendar injection and in-call date validation were removed after a real call
derailed and the 8B model proved unable to compute dates reliably. Post-call
resolved_date is best-effort/staff-verified only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:20:57 +00:00
tocmo0nlord
199d16c630 Document call data capture, date validation, and hang-up grace pause
- New "Call Data Capture & Date Validation" section: the six captured fields
  (full name, phone confirm/alternate, office, reason, insurance log-only,
  validated preferred date/time), how each is logged, and the per-call calendar
  injection that drives date pushback.
- EndCallProcessor note: HANGUP_DELAY_SECS grace pause; Phase 1 gate result.
- .env reference: add HANGUP_DELAY_SECS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 03:01:49 +00:00
tocmo0nlord
93620be9bb Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token
Reframe Change 1/2/3 to record the actual decisions instead of the trialed
swaps: Deepgram and the Twilio Standard API Key were both evaluated and
reverted. Document why the API Key cannot replace the Auth Token (Twilio signs
webhooks with the Auth Token). Update the .env reference, Phase 1 checklist,
dependencies, and open items accordingly; gate zombie-check uses ps/pgrep
(bare process, not Docker).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 01:09:50 +00:00
tocmo0nlord
004ef3bdc0 Update CLAUDE.md: Gitea URLs + keep repo name consistent as avc-phone-ai
Adds the full Gitea repo URL to the infrastructure table and the monitoring
dashboard line, and keeps the repository-structure tree root as avc-phone-ai
to match the rest of the doc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 23:57:57 +00:00
tocmo0nlord
204865b733 Fix repo name references to avc-phone-ai in CLAUDE.md
Header, structure tree, and footer pointed at avc-phone-agent; the actual
Gitea repo is avc-phone-ai. The avc-phone-agent-prod Twilio API key name is
left unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 22:46:48 +00:00
4bf72b9616 Upload files to "/" 2026-06-23 20:45:56 +00:00