Call-recording analysis proved the repetitive post-phone silence is NOT volume:
the caller's reply was a full-energy sound right after the (long ~13s) phone
question, but VAD never registered it (no "user started speaking"), so the call
waited until the caller repeated it. Depending on catching a "yes" after a long
utterance is fragile (echo/gate timing).
Fix: stop requiring a "yes". AVA now states the number and invites a correction
only ("...; if that's not the best number, just let me know.") and flows on —
the caller only speaks to correct it. Updated the prompt step, the caller-ID
injection, and the deterministic EndCallProcessor line. Verified 4/4.
Docs: phone step, recording + watchdog entries, recording's post-gate limitation.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stop debugging silence by guesswork: AudioBufferProcessor records every call to
recordings/<ts>_<callsid>.wav (caller=left, agent=right) so calls can be reviewed
with actual audio. (We had no audio before — that was the real gap; the earlier
"too quiet" explanation was unsupported.)
SilenceWatchdog: after the agent finishes, if the caller is silent for
SILENCE_REPROMPT_SECS (7s) it re-prompts ("are you still there?"); after
MAX_REPROMPTS it closes gracefully. This directly breaks the dead-silence
pattern (e.g. the 14s gap after the phone confirmation) instead of waiting.
Runtime-tested both. .gitignore already excludes recordings/.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After the phone confirmation a caller's "yes" wasn't picked up (silence) until
they repeated it louder. Logs: line was live and the half-duplex gate had
reopened, but VAD never fired for ~14s — the quick/quiet "yes" was below
threshold (min_volume 0.3, start_secs 0.2).
Now that HalfDuplexGate gates out the agent's echo while it speaks, VAD can be
sensitive without echo false-triggers (it only listens hard on the caller's
turn). Lowered min_volume 0.3->0.15, start_secs 0.2->0.1, and trimmed the echo
tail 0.5->0.25 so an answer right after the agent stops isn't dropped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller's reply was generated but never heard: 0.65s after the agent started
speaking, the VAD fired "user started speaking" (NO transcript) and broadcast an
interruption that cancelled the agent's audio -> ~24s of silence until the caller
spoke again. Cause: the agent's own TTS echoes back the phone line and the
always-on VAD interruption treats it as a barge-in. (PipelineParams has no
allow_interruptions in this pipecat build — it was a silent no-op.)
Fix: HalfDuplexGate before the VAD withholds inbound audio while the bot speaks
(+ECHO_TAIL_SECS, default 0.5s), so echo can't trigger a false barge-in.
Half-duplex (no mid-utterance barge-in); HALF_DUPLEX=false to restore it.
Runtime-tested the gate (pass idle / drop while speaking / drop in tail / resume).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller gave their insurance; AVA replied with a bare acknowledgment ("staff
will verify your coverage") and stopped, with no follow-up question. Both sides
then waited -> dead air (pipeline idle, no GPU/LLM activity, matching flat
memory/wattage). Caller had to break the silence with "what questions do you
have?". Root cause: the one-sentence brevity rule made AVA end a booking turn on
a dead-end statement.
Fix: prompt now requires, until the booking is complete, that every turn end
with the next question — acknowledgment + next question in the same turn (e.g.
insurance ack -> immediately ask day/time). Verified 4/4. Documented in CLAUDE.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EndCallProcessor now guarantees the callback number is confirmed on booking
calls: the 8B reads it back only ~half the time, so if a closing is reached on a
booking call (booking keyword seen) without the agent having spoken the number
(phone_marker absent from its replies), the hang-up is suppressed and a scripted
confirmation line (caller-ID spelled out) is injected as a TTSSpeakFrame first.
The agent's own readback satisfies the gate (no double-ask); info-only calls are
never asked for a number. Runtime-tested all four paths (inject / no-inject /
info-only / inject-then-end).
CLAUDE.md: document the safety net, the "never claim a booking" rule, the direct
phone-confirm phrasing, and the insurance "never say we accept" rule.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes from a test call:
- Contradiction: AVA said "staff will confirm" then later "we've got your
appointment scheduled". Hardened the rule — never say booked/scheduled/set/
confirmed (even in the recap); it's always a REQUEST staff confirm on callback.
Wrap-up recaps as "I've noted your request...".
- Phone: it asked "may I read your number back?" then read it anyway. Now states
it directly in one line ("I have your number as <number> - is that best?"),
no permission ask, don't skip.
- Insurance: stop saying "we accept/take <plan>" (it said "we accept All State",
which isn't even a listed plan) — just note it, staff verify.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Reason extraction missed symptom-style reasons: a caller said "I'm actually
blind" and the lead logged reason=None (it caught "disintegrated eyes" before
but not this). Broadened the extractor's reason rule to capture the eye
problem/symptom as the reason, not just visit types. Verified 3/3 -> "vision
loss / blindness".
- server.py: move the LLM warmup/pin (keep_alive=-1) from the deprecated
on_event("startup") to a lifespan handler — silences the FastAPI deprecation
warning; model still shows ollama ps UNTIL=Forever.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Reason visibility: the reason WAS extracted ("disintegrated eyes") but only
lived in the Odoo description note. Add it to the post-call log line and to
the Odoo lead title so it's visible at a glance.
- Latency: split the timing — Whisper is ~0.1s, latency is LLM-side. The ~3s
tail was cold model reloads after Ollama's keep-alive expired. server.py now
warms + pins the model on startup (keep_alive=-1, ollama ps UNTIL=Forever),
removing cold first-turn stalls. Whisper size left alone (not the bottleneck).
- CLAUDE.md: insurance rule (never suggest/guess the plan), latency note.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A caller trailed off ("My insurance plan is...") and AVA filled in "CarePlus",
which got logged to the lead. Tightened the insurance rule: ask open-endedly,
do NOT read out/suggest plan names from the accepted list, capture only what the
caller says, never fill in/complete/guess the plan, and ask them to repeat if
unclear. Verified 4/4 on the trail-off case (asks to repeat, no fabricated plan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document AVA's directed call script — reason first, location, caller info
(address by name), verify phone by readback near the end, wrap up with "anything
else?" — and the gated closing (Goodbye only after the anything-else question).
Note the 8B reliability ceiling on step ordering.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace loose "gather these details" with a directed script so the call has
clear direction:
1. Reason first — what are they calling about
2. Location — city/area, confirm the matching office
3. Caller info — full name, then address them by name; insurance (log only),
preferred day/time
4. Verify phone near the end by reading it back
5. Wrap up — recap, then "Is there anything else I can help you with?"
Closing hardened: "Goodbye" (which ends the call) is gated behind the
anything-else question, never said in the same turn as confirming details.
Be warm but direct; one short turn at a time.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- .env.example: add AGENT_NAME_SPOKEN=Eva.
- CLAUDE.md: note the agent-name respelling (AVA -> Eva, "EE-vuh"), that the
caller-ID is injected pre-spelled (model mangles raw digits), and that the
phone is confirmed near the END of the call, not led with.
(.env itself is gitignored; AGENT_NAME_SPOKEN=Eva set there and live.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Phone: inject the caller-ID into the prompt already spelled digit-by-digit so
the model repeats clean words instead of mangling raw digits (it had emitted
"197-three five seven three..." -> Kokoro read "one hundred ninety-seven").
- Flow: stop leading with the phone number. Prompt now flows naturally and
saves the callback-number confirmation for the END; the caller-ID line says
not to recite it early. Verified 3/3 openings no longer recite the number.
- Name: Kokoro spelled all-caps "AVA" as "A-V-A". Respell to AGENT_NAME_SPOKEN
(default "Ava") in TTS only; logs/Odoo keep AGENT_NAME. Override e.g.
AGENT_NAME_SPOKEN=Eva for an "EE-vuh" sound.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Note in the Call Data Capture table that AVA confirms a matching office and
moves on rather than offering/comparing other offices — the fix for the
"I'm in Kendall" -> "Hollywood or Miami?" off-script behavior.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a caller named a city matching an office ("I'm in Kendall"), AVA confirmed
Kendall then asked them to pick between unrelated offices ("Hollywood or
Miami?"), going off script. Tightened the prompt: on a city that matches an
office, confirm THAT office and move on; never offer/compare other offices or
ask the caller to choose; name the nearest only if nothing matches. Verified
3/3 on the failing scenario.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
run_tts is called as run_tts(self, text, context_id); the override only accepted
(self, text), so every utterance raised "takes 2 positional arguments but 3
were given" and produced no audio — callers heard nothing on every call since
the number-normalization change. Added context_id and pass it through. Verified
the service now emits audio (118KB for a sample) with digits normalized.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document the TTS number-reading fix in the "already solved" section: phone
numbers, street numbers, and zips are spoken digit-by-digit (no "dash"/parens,
country code dropped); dates/times left natural. tts_normalize() holds the rules.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Kokoro spoke "983-4969" as "nine hundred eighty-three dash forty-nine sixty-
nine". Added SpokenKokoroTTSService which normalizes text just before synthesis
(run_tts gets the full sentence): US phone patterns and 4-5 digit runs (street
numbers, zips) are spoken one digit at a time, country code dropped, no "dash"/
parens. Dates and times are left natural. Deterministic, so it's robust to
whatever the model emits.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Long calls overflowed the 4096-token window mid-conversation, forcing Ollama to
truncate + re-evaluate the full context each turn = multi-second stalls / dead
air. Rebuilt activeblue-avc:latest with num_ctx 8192 (rollback tag
activeblue-avc:pre-ctx8k). Combined with removing the 45-day calendar injection,
this keeps long calls well under the window. Doc: context row, Modelfile
reference, and a root-cause note.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Update the call-capture section to reflect the fix — AVA takes the day/time in
the caller's words and defers exact-date confirmation to staff; the 45-day
calendar injection and in-call date validation were removed after a real call
derailed and the 8B model proved unable to compute dates reliably. Post-call
resolved_date is best-effort/staff-verified only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A real call derailed: AVA argued about today's date, parroted the canned date
example, hallucinated appointment availability, and rambled. Root cause was the
date-validation feature — the local 8B model computes appointment dates wrong
~5/5 in testing, so having it state/correct dates is a liability.
- DATES: capture & defer — AVA takes the day/time in the caller's own words,
never computes/states/corrects the calendar date, never argues about today;
staff confirm the exact date on callback. Removed the 45-day calendar
injection and _date_context()/datetime use.
- Hardened the no-availability rule (no "openings", no "check availability",
no "I'll book").
- Brevity: one short sentence per reply (two at most).
Post-call extractor still records a best-effort resolved date (staff-verified).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- New "Call Data Capture & Date Validation" section: the six captured fields
(full name, phone confirm/alternate, office, reason, insurance log-only,
validated preferred date/time), how each is logged, and the per-call calendar
injection that drives date pushback.
- EndCallProcessor note: HANGUP_DELAY_SECS grace pause; Phase 1 gate result.
- .env reference: add HANGUP_DELAY_SECS.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In-call (system prompt + per-call calendar injection):
- Gather full name (prompt asks for last name if only first given).
- Confirm the caller-ID number; if declined, use the number the caller gives.
- Ask for and LOG insurance only — never promise/confirm/deny coverage or
treatment based on it; staff verify on callback.
- Validate the requested date against an injected 45-day calendar (recomputed
per call since the server is long-running). Push back on impossible/mismatched
dates, e.g. "Monday lands on the sixth — would you like that date?".
- AGENT_NAME=AVA; 4s grace pause before hang-up (HANGUP_DELAY_SECS).
Logging (post-call extraction -> Odoo):
- Extract full name, phone_confirmed, chosen callback (caller-ID or alternate),
insurance, reason, and preferred time annotated with a resolved YYYY-MM-DD
date (today's date is fed to the extractor).
- odoo_client: insurance row on the lead note (log only — staff verify).
.gitignore: ignore rotated avc_run.log* files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reframe Change 1/2/3 to record the actual decisions instead of the trialed
swaps: Deepgram and the Twilio Standard API Key were both evaluated and
reverted. Document why the API Key cannot replace the Auth Token (Twilio signs
webhooks with the Auth Token). Update the .env reference, Phase 1 checklist,
dependencies, and open items accordingly; gate zombie-check uses ps/pgrep
(bare process, not Docker).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deepgram and the Twilio Standard API Key were reverted per decision:
- bot.py: restore HintedWhisperSTTService (faster-whisper hotwords), default
model medium; remove DeepgramSTTService import + DEEPGRAM_API_KEY.
- server.py: restore TWILIO_AUTH_TOKEN for X-Twilio-Signature validation and
the serializer auto-hang-up. Twilio signs webhooks with the Auth Token, so
an API Key Secret cannot validate signatures.
- .env.example: back to TWILIO_AUTH_TOKEN + Whisper STT vars.
- .gitignore: ignore runtime *.log (avc_run.log).
OLLAMA_MODEL stays activeblue-avc:latest (the existing pulled tag).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the full Gitea repo URL to the infrastructure table and the monitoring
dashboard line, and keeps the repository-structure tree root as avc-phone-ai
to match the rest of the doc.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Header, structure tree, and footer pointed at avc-phone-agent; the actual
Gitea repo is avc-phone-ai. The avc-phone-agent-prod Twilio API key name is
left unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>