Adds the full Gitea repo URL to the infrastructure table and the monitoring dashboard line, and keeps the repository-structure tree root as avc-phone-ai to match the rest of the doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
20 KiB
AVC Phone Agent — Project Specification
Claude Code authoritative reference. All architecture, security, and build decisions live here. Repo:
git.activeblue.net/tocmo0nlord/avc-phone-aiLast updated: 2026-06-23 | Active Blue LLC
Project Overview
Name: AVC Phone Agent Owner: Active Blue LLC Client: Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX) Agent name: AVA (Advanced Vision Assistant) Purpose: Automated AI phone agent that answers patient calls, books tentative appointments into Odoo CRM with call recordings and transcripts attached, and self-improves via Claude-powered transcript monitoring and a fine-tuning feedback loop.
Existing Codebase — What to Keep, What to Change
The previous build at /home/tocmo0nlord/avc-phone/ is a working foundation.
Do not rewrite what works. Apply only the changes documented in this section.
Files and their status
| File | Status | Action |
|---|---|---|
bot.py |
Keep with one change | Swap Whisper STT for Deepgram Nova-2 |
server.py |
Keep with one change | Swap Auth Token for API Key Secret |
practice.py |
Keep as-is | No changes |
extract.py |
Keep as-is | No changes |
odoo_client.py |
Keep as-is | Already uses API key auth correctly |
What is already solved — do not touch
EndCallProcessor in bot.py — AVC-side call termination is fully implemented.
Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via
BotStoppedSpeakingFrame, then pushes EndTaskFrame upstream. TwilioFrameSerializer
with auto_hang_up drops the carrier leg. This is correct. Zero changes.
Mulaw 8kHz ↔ 16kHz conversion — handled internally by TwilioFrameSerializer.
PIPELINE_SAMPLE_RATE = 16000, WIRE_SAMPLE_RATE = 8000 are already set correctly.
No custom audio module needed.
VAD tuned for telephony — confidence=0.5, min_volume=0.3 already loosened from
desktop defaults. These settings directly address the repeat-yourself problem on the
VAD side.
Capacity gating — MAX_CONCURRENT_CALLS=2 with atomic slot reservation in
server.py prevents GPU thrashing. Keep it.
AudioHeartbeat — diagnostic processor that distinguishes VAD failure from
transport stall. Keep it.
Post-call extraction (extract.py) — single JSON-mode completion after call ends.
Correctly uses format: json, uses verified Twilio caller-ID instead of trusting model
output, falls back to JSONL if Odoo is unreachable. Keep it.
Odoo integration (odoo_client.py) — already uses ODOO_API_KEY for XML-RPC auth,
not password. Correct pattern. No changes.
Change 1 — Swap Whisper STT for Deepgram Nova-2 (bot.py)
Why: Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering before the LLM sees any input. This is the primary cause of non-reply and the repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers end-of-utterance events in under 300ms.
Remove from bot.py:
# Remove this import
from pipecat.services.whisper.stt import WhisperSTTService
# Remove these env vars
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")
# Remove the entire HintedWhisperSTTService class
Add to bot.py:
# Add import
from pipecat.services.deepgram.stt import DeepgramSTTService
# Add env var
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
# Replace stt instantiation in run_agent()
stt = DeepgramSTTService(
api_key=DEEPGRAM_API_KEY,
settings=DeepgramSTTService.Settings(
model="nova-2",
language="en-US",
smart_format=True,
punctuate=True,
interim_results=False, # final transcripts only — avoids double-firing
utterance_end_ms=1000, # ms of silence before end-of-utterance fires
)
)
Note on Whisper: Remove from real-time pipeline only. Whisper large-v3 is retained
for post-call transcription in Phase 3 (recording/transcriber.py) where latency does
not matter and accuracy is more important than speed.
Change 2 — Swap Auth Token for API Key Secret (server.py)
Why: TWILIO_AUTH_TOKEN is the master credential for the entire Twilio account.
A leak compromises every Twilio integration. A Standard API Key is scoped to this
application and revocable independently.
Credential hierarchy:
Twilio Account SID (not secret on its own)
├── Auth Token (master — Twilio console only, rotate quarterly)
└── API Key: avc-phone-agent-prod (Standard scope)
├── TWILIO_API_KEY_SID: SK...
└── TWILIO_API_KEY_SECRET: (treat as a password)
Create the API Key:
- Twilio console → Account → API Keys → Create new Standard key
- Name it
avc-phone-agent-prod - Copy SID (
SK...) and Secret — Secret is shown once only
Changes in server.py:
Remove:
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
Add:
TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
In _twilio_signature_ok(), change the HMAC key:
# Before
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
# After
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
Update the guard condition:
# Before
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
# After
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
Update the warning log:
# Before
elif not TWILIO_AUTH_TOKEN:
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
# After
elif not TWILIO_API_KEY_SECRET:
logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
In TwilioFrameSerializer instantiation:
# Before
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=TWILIO_ACCOUNT_SID,
auth_token=TWILIO_AUTH_TOKEN,
)
# After
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=TWILIO_ACCOUNT_SID,
auth_token=TWILIO_API_KEY_SECRET,
)
Key rotation procedure:
- Create new Standard API Key in Twilio console
- Update
TWILIO_API_KEY_SID+TWILIO_API_KEY_SECRETin.env - Restart the service — no rebuild needed
- Verify one test call succeeds
- Revoke old key in Twilio console
Rotate on: any suspected leak, any team member departure, quarterly as routine.
Change 3 — Update .env
Remove:
TWILIO_AUTH_TOKEN=
Add:
TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
DEEPGRAM_API_KEY=
Full .env reference:
# Twilio — Auth Token lives in Twilio console only, never on this server
TWILIO_ACCOUNT_SID=AC...
TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
TWILIO_PHONE_NUMBER=+1...
# STT: Deepgram (real-time, in-call only)
DEEPGRAM_API_KEY=
DEEPGRAM_MODEL=nova-2
# LLM: Ollama
OLLAMA_URL=http://127.0.0.1:11434/v1
OLLAMA_MODEL=activeblue-avc:latest
LLM_PROVIDER=ollama
LLM_TEMPERATURE=0.3
LLM_MAX_TOKENS=160
# Anthropic (optional LLM swap + monitoring + synthetic data)
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-sonnet-4-6
# TTS: Kokoro
KOKORO_VOICE=af_heart
KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models
# Odoo
ODOO_URL=https://avc.activeblue.net
ODOO_DB=avc
ODOO_USER=
ODOO_API_KEY=
ODOO_TARGET=crm
ODOO_STAGE_ID=
ODOO_TEAM_ID=
ODOO_USER_ID=
# Server
PUBLIC_HOST=avc-phone.activeblue.net
PORT=8200
BIND_HOST=127.0.0.1
MAX_CONCURRENT_CALLS=2
STREAM_TOKEN=
# Call behaviour
AGENT_NAME=AVA
ENABLE_TOOLS=
VAD_CONFIDENCE=0.5
VAD_MIN_VOLUME=0.3
VAD_START_SECS=0.2
VAD_STOP_SECS=0.5
# Monitoring (Phase 4)
MONITORING_ENABLED=true
MONITORING_SCHEDULE=0 2 * * *
# A/B model routing (Phase 5 only)
AB_SPLIT_PERCENT=0
AB_MODEL_B=
Model Configuration
Current production model: activeblue-avc:latest
| Property | Value | Notes |
|---|---|---|
| Base | llama3.1:8b-instruct-q4_K_M |
Llama 3.1 8B, Q4_K_M quantization |
| ID | 366a6cc15bb7 |
Rebuilt clean 2026-06-23 |
| Size | 4.9GB | Down from 8.7GB Q8_0 |
| VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 |
| Context | 4096 tokens | Sufficient for any phone call |
| Temperature | 0.3 | Low — maximizes JSON schema compliance |
| Top-p | 0.9 | Standard |
| Adapter | None | 44-pair LoRA adapter discarded |
Modelfile (rebuild reference)
FROM llama3.1:8b-instruct-q4_K_M
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER num_ctx 4096
PARAMETER temperature 0.3
PARAMETER top_p 0.9
TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
{{ .Content }}<|eot_id|>
{{- end }}<|start_header_id|>assistant<|end_header_id|>
"
Why Q4_K_M not Q8_0
Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality difference at 8B scale.
Why no adapter
44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format.
Ollama inventory (current)
activeblue-avc:latest 366a6cc15bb7 4.9GB production
llama3.1:8b-instruct-q4_K_M 46e0c10c039e 4.9GB base
nomic-embed-text:latest 0a109f422b47 274MB embeddings
Phase 5 training note
Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF:
# Phase 5 only — do not run now
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct
# ~16GB on disk, separate from Ollama storage
Build Phases
Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
Phase 1 — Reliable call loop
Goal: Every utterance gets a response. Zero silent failures. AVC hangs up — not the caller.
- Apply Change 1: swap Whisper for Deepgram in
bot.py - Apply Change 2: swap Auth Token for API Key Secret in
server.py - Apply Change 3: update
.env - Verify
EndCallProcessortermination in Twilio call logs (AVC side, not caller) - Verify
AudioHeartbeatdiagnostic logging active - Verify
MAX_CONCURRENT_CALLScapacity gating works
Gate — all five must pass:
- 10 consecutive test calls — zero silent non-responses
- Zero zombie pipeline instances after call ends (
docker stats) - Call termination from AVC side confirmed in Twilio call logs
- JSON parse failure rate visible in logs — measurable not invisible
- Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
Phase 2 — Accuracy (RAG + validation)
- Populate
rag/data/*.jsonlwith real AVC data (human task — see RAG section) - ChromaDB RAG retriever wired into pipeline
- Response validator: JSON schema + factual cross-check + PHI leak scan
- Keyword blocklist (uncertainty phrases → handoff)
- Intent classifier routing
- Turn counter: max 3 failed turns before forced handoff + termination
Gate: 20 manual test calls, zero hallucinations on AVC-specific facts
Phase 3 — Booking
- Real-time calendar availability check (
odoo/calendar.py) - Whisper large-v3 post-call transcription (
recording/transcriber.py) - Recording + transcript attached to Odoo lead chatter
- Staff review flow confirmed in Odoo
Gate: Staff receives, reviews, and confirms a lead end-to-end
Phase 4 — Monitoring
- Transcript index (
recordings/index.jsonl) - Claude monitoring job
- Dashboard: toggle, alert queue, one-click apply, playback, quality tagging
Gate: First monitoring run produces actionable suggestions
Phase 5 — Fine-tuning
- Pull HuggingFace base (see model section)
- Synthetic data generation via Claude API in JSON output format
- Real call exporter using staff quality tags
- Axolotl QLoRA on RTX 5080
- Model registry + versioning + A/B routing
Gate: New model outperforms baseline over 50+ calls
Repository Structure
avc-phone-ai/
├── CLAUDE.md ← this file
├── README.md
├── .env ← never committed
├── .env.example
├── .gitignore ← includes .env, recordings/, *.gguf
│
├── bot.py ← Pipecat pipeline (Phase 1 changes here)
├── server.py ← Twilio webhook server (Phase 1 changes here)
├── practice.py ← AVC facts + Odoo persistence
├── extract.py ← post-call appointment extraction
├── odoo_client.py ← Odoo XML-RPC client
│
├── rag/ ← Phase 2
│ ├── store.py
│ ├── loader.py
│ ├── retriever.py
│ └── data/
│ ├── avc_locations.jsonl
│ ├── avc_providers.jsonl
│ ├── avc_services.jsonl
│ ├── avc_hours.jsonl
│ ├── avc_insurance.jsonl
│ └── avc_faqs.jsonl
│
├── recording/ ← Phase 3
│ ├── transcriber.py ← Whisper large-v3 post-call only
│ └── storage.py
│
├── monitoring/ ← Phase 4
│ ├── monitor.py
│ ├── analyzer.py
│ ├── diff_engine.py
│ ├── scheduler.py
│ └── dashboard/
│ ├── app.py
│ └── static/
│
├── training/ ← Phase 5 stub
│ └── README.md
│
├── tests/
│ ├── test_bot.py
│ ├── test_server.py
│ ├── test_odoo_client.py
│ ├── test_extract.py
│ └── fixtures/
│ └── sample_transcripts.jsonl
│
├── scripts/
│ ├── deploy.sh
│ └── smoke_test.sh
│
├── avc-phone.service ← existing systemd unit
└── traefik-avc-phone.yml ← existing Traefik config
Infrastructure
| Component | Host | Address | Notes |
|---|---|---|---|
| Pipecat pipeline | miaai |
10.10.1.221 |
Python async, systemd |
| Ollama LLM | miaai |
http://127.0.0.1:11434/v1 |
activeblue-avc:latest |
| ChromaDB (Phase 2) | miaai |
http://10.10.1.221:8001 |
Docker volume |
| Twilio webhook | miaai |
https://avc-phone.activeblue.net |
Traefik + Let's Encrypt |
| Monitoring dashboard | miaai |
https://avc-monitor.activeblue.net |
internal only |
| Odoo CRM | — | https://avc.activeblue.net |
XML-RPC, db: avc |
| Recordings | miaai |
/home/tocmo0nlord/avc-phone/recordings/ |
local only |
| Gitea | — | https://git.activeblue.net/tocmo0nlord/avc-phone-ai |
user: tocmo0nlord |
RAG Store (Phase 2)
Stack: ChromaDB + nomic-embed-text:latest (already in Ollama)
Collection: avc_knowledge
Retrieval: Top-3 chunks per query on caller's current turn only
JSONL record format
{
"id": "hours-kendall-weekday",
"text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.",
"tags": ["hours", "kendall"],
"last_updated": "2026-06-23"
}
Data files — populated before Phase 2, not before Phase 1
| File | Content |
|---|---|
avc_locations.jsonl |
Address, phone, fax, parking per location |
avc_providers.jsonl |
Name, title, specialty, locations, languages |
avc_services.jsonl |
Exam types, procedures |
avc_hours.jsonl |
Hours per location, holiday closures, after-hours |
avc_insurance.jsonl |
Accepted plans per location |
avc_faqs.jsonl |
Approved Q&A pairs |
Note: practice.py already contains real AVC location and insurance data scraped
from advancedvisioncareflorida.com. Use it as the seed for the JSONL files rather
than starting from scratch.
Claude Monitoring (Phase 4)
What it analyzes
- Facts stated by AVA contradicting RAG store
- System prompt violations
- Calls that should have been handoffs
- High failed turn counts — model or prompt signal
- RAG gaps (AVA said "I don't have that" — should it be added?)
- Phrasing that caused caller confusion
Output schema
{
"call_sid": "CA...",
"severity": "high",
"issue_type": "factual_error",
"description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.",
"suggested_action": "rag_update",
"suggested_change": {
"file": "rag/data/avc_hours.jsonl",
"record_id": "hours-kendall-weekday",
"field": "text",
"old": "...open until 6pm...",
"new": "...open until 5pm..."
}
}
suggested_action: rag_update | prompt_change | blocklist_add | flag_for_review
Dashboard
FastAPI + HTML/JS at https://avc-monitor.activeblue.net (internal only).
| Feature | Description |
|---|---|
| Enable/disable toggle | Pauses scheduler without redeployment |
| Alert queue | Suggestions sorted by severity |
| One-click apply | Applies change, commits via Gitea API to avc-phone-ai |
| Call playback | Audio + transcript side-by-side |
| Quality tagging | Staff tags calls from dashboard |
| Manual trigger | POST /monitor/run |
Fine-Tuning Pipeline (Phase 5 — stub)
Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks. See
training/README.md— populated at Phase 5 start.
- Synthetic data: Claude API generates Q&A in JSON output format — schema not style
- Real calls: staff-tagged
"good"+ corrected bad calls - Target: 500+ pairs per intent before first Axolotl run
- QLoRA via Axolotl on RTX 5080, base: HuggingFace
meta-llama/Llama-3.1-8B-Instruct - Versioned Ollama models:
activeblue-avc:vN - A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls
HIPAA and Compliance
- AVA identifies as automated at call start — no exceptions
- No PHI in ChromaDB — practice information only
- Recordings on
miaaionly — no cloud storage - Odoo API user: minimum permissions, not admin
- All endpoints HTTPS via Traefik
.envnever committed
Deploy Script (scripts/deploy.sh)
#!/bin/bash
set -e
cd /home/tocmo0nlord/avc-phone
git pull origin main
pip install -r requirements.txt --quiet
systemctl restart avc-phone
systemctl status avc-phone --no-pager
echo "[deploy] Done."
Development Conventions
- Python 3.13 (matches
miaaiminiconda environment) - Async throughout — Pipecat is async-native
logurufor all logging — already in use, keep consistent- Structured log lines for all diagnostic events
python-dotenvfor local dev, env injection in prod- Secrets never hardcoded
- Every module has
if __name__ == "__main__":for isolated testing
Key Dependencies (current)
pipecat-ai==1.3.0 # installed at /opt/miniconda3
pipecat-ai[deepgram] # add for Phase 1 Deepgram swap
deepgram-sdk # add for Phase 1
kokoro-tts # already installed
ollama # already installed
scipy / numpy # already installed (pipecat deps)
chromadb # add for Phase 2
sentence-transformers # add for Phase 2
anthropic # for monitoring + optional LLM swap
openai-whisper # retained for post-call transcription only
fastapi / uvicorn # already installed
loguru # already installed
httpx # already installed
Open Items
- Create
avc-phone-agent-prodStandard API Key in Twilio console - Add
TWILIO_API_KEY_SID+TWILIO_API_KEY_SECRET+DEEPGRAM_API_KEYto.env - Confirm
ODOO_STAGE_ID,ODOO_TEAM_ID,ODOO_USER_IDfrom liveavcdb - Confirm AVA voice —
af_heartis current default, confirm with AVC before go-live - Populate
rag/data/*.jsonlbefore Phase 2 (seed frompractice.pydata) - Define Odoo confirmed appointment flow: lead → opportunity → calendar event
- Staff training on monitoring dashboard quality tagging
Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-ai