# AVC Phone Agent — Project Specification > Claude Code authoritative reference. All architecture, security, and build decisions live here. > Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai` > Last updated: 2026-06-23 | Active Blue LLC --- ## Project Overview **Name:** AVC Phone Agent **Owner:** Active Blue LLC **Client:** Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX) **Agent name:** AVA (Advanced Vision Assistant) **Purpose:** Automated AI phone agent that answers patient calls, books tentative appointments into Odoo CRM with call recordings and transcripts attached, and self-improves via Claude-powered transcript monitoring and a fine-tuning feedback loop. --- ## Existing Codebase — What to Keep, What to Change The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation. **Do not rewrite what works.** Apply only the changes documented in this section. ### Files and their status | File | Status | Action | |------|--------|--------| | `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 | | `server.py` | Keep with one change | Swap Auth Token for API Key Secret | | `practice.py` | Keep as-is | No changes | | `extract.py` | Keep as-is | No changes | | `odoo_client.py` | Keep as-is | Already uses API key auth correctly | ### What is already solved — do not touch **`EndCallProcessor` in `bot.py`** — AVC-side call termination is fully implemented. Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via `BotStoppedSpeakingFrame`, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up` drops the carrier leg. This is correct. Zero changes. **Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`. `PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly. No custom audio module needed. **VAD tuned for telephony** — `confidence=0.5`, `min_volume=0.3` already loosened from desktop defaults. These settings directly address the repeat-yourself problem on the VAD side. **Capacity gating** — `MAX_CONCURRENT_CALLS=2` with atomic slot reservation in `server.py` prevents GPU thrashing. Keep it. **`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from transport stall. Keep it. **Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends. Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model output, falls back to JSONL if Odoo is unreachable. Keep it. **Odoo integration (`odoo_client.py`)** — already uses `ODOO_API_KEY` for XML-RPC auth, not password. Correct pattern. No changes. --- ## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`) **Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering before the LLM sees any input. This is the primary cause of non-reply and the repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers end-of-utterance events in under 300ms. **Remove from `bot.py`:** ```python # Remove this import from pipecat.services.whisper.stt import WhisperSTTService # Remove these env vars WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base") WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16") WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # Remove the entire HintedWhisperSTTService class ``` **Add to `bot.py`:** ```python # Add import from pipecat.services.deepgram.stt import DeepgramSTTService # Add env var DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "") # Replace stt instantiation in run_agent() stt = DeepgramSTTService( api_key=DEEPGRAM_API_KEY, settings=DeepgramSTTService.Settings( model="nova-2", language="en-US", smart_format=True, punctuate=True, interim_results=False, # final transcripts only — avoids double-firing utterance_end_ms=1000, # ms of silence before end-of-utterance fires ) ) ``` **Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does not matter and accuracy is more important than speed. --- ## Change 2 — Swap Auth Token for API Key Secret (`server.py`) **Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account. A leak compromises every Twilio integration. A Standard API Key is scoped to this application and revocable independently. **Credential hierarchy:** ``` Twilio Account SID (not secret on its own) ├── Auth Token (master — Twilio console only, rotate quarterly) └── API Key: avc-phone-agent-prod (Standard scope) ├── TWILIO_API_KEY_SID: SK... └── TWILIO_API_KEY_SECRET: (treat as a password) ``` **Create the API Key:** 1. Twilio console → Account → API Keys → Create new Standard key 2. Name it `avc-phone-agent-prod` 3. Copy SID (`SK...`) and Secret — Secret is shown once only **Changes in `server.py`:** Remove: ```python TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN") ``` Add: ```python TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID") TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET") ``` In `_twilio_signature_ok()`, change the HMAC key: ```python # Before digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest() # After digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest() ``` Update the guard condition: ```python # Before if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN: # After if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET: ``` Update the warning log: ```python # Before elif not TWILIO_AUTH_TOKEN: logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)") # After elif not TWILIO_API_KEY_SECRET: logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)") ``` In `TwilioFrameSerializer` instantiation: ```python # Before serializer = TwilioFrameSerializer( stream_sid=stream_sid, call_sid=call_sid, account_sid=TWILIO_ACCOUNT_SID, auth_token=TWILIO_AUTH_TOKEN, ) # After serializer = TwilioFrameSerializer( stream_sid=stream_sid, call_sid=call_sid, account_sid=TWILIO_ACCOUNT_SID, auth_token=TWILIO_API_KEY_SECRET, ) ``` **Key rotation procedure:** 1. Create new Standard API Key in Twilio console 2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env` 3. Restart the service — no rebuild needed 4. Verify one test call succeeds 5. Revoke old key in Twilio console Rotate on: any suspected leak, any team member departure, quarterly as routine. --- ## Change 3 — Update `.env` **Remove:** ```env TWILIO_AUTH_TOKEN= ``` **Add:** ```env TWILIO_API_KEY_SID=SK... TWILIO_API_KEY_SECRET= DEEPGRAM_API_KEY= ``` **Full `.env` reference:** ```env # Twilio — Auth Token lives in Twilio console only, never on this server TWILIO_ACCOUNT_SID=AC... TWILIO_API_KEY_SID=SK... TWILIO_API_KEY_SECRET= TWILIO_PHONE_NUMBER=+1... # STT: Deepgram (real-time, in-call only) DEEPGRAM_API_KEY= DEEPGRAM_MODEL=nova-2 # LLM: Ollama OLLAMA_URL=http://127.0.0.1:11434/v1 OLLAMA_MODEL=activeblue-avc:latest LLM_PROVIDER=ollama LLM_TEMPERATURE=0.3 LLM_MAX_TOKENS=160 # Anthropic (optional LLM swap + monitoring + synthetic data) ANTHROPIC_API_KEY= ANTHROPIC_MODEL=claude-sonnet-4-6 # TTS: Kokoro KOKORO_VOICE=af_heart KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models # Odoo ODOO_URL=https://avc.activeblue.net ODOO_DB=avc ODOO_USER= ODOO_API_KEY= ODOO_TARGET=crm ODOO_STAGE_ID= ODOO_TEAM_ID= ODOO_USER_ID= # Server PUBLIC_HOST=avc-phone.activeblue.net PORT=8200 BIND_HOST=127.0.0.1 MAX_CONCURRENT_CALLS=2 STREAM_TOKEN= # Call behaviour AGENT_NAME=AVA ENABLE_TOOLS= VAD_CONFIDENCE=0.5 VAD_MIN_VOLUME=0.3 VAD_START_SECS=0.2 VAD_STOP_SECS=0.5 # Monitoring (Phase 4) MONITORING_ENABLED=true MONITORING_SCHEDULE=0 2 * * * # A/B model routing (Phase 5 only) AB_SPLIT_PERCENT=0 AB_MODEL_B= ``` --- ## Model Configuration ### Current production model: `activeblue-avc:latest` | Property | Value | Notes | |----------|-------|-------| | Base | `llama3.1:8b-instruct-q4_K_M` | Llama 3.1 8B, Q4_K_M quantization | | ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 | | Size | 4.9GB | Down from 8.7GB Q8_0 | | VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 | | Context | 4096 tokens | Sufficient for any phone call | | Temperature | 0.3 | Low — maximizes JSON schema compliance | | Top-p | 0.9 | Standard | | Adapter | None | 44-pair LoRA adapter discarded | ### Modelfile (rebuild reference) ``` FROM llama3.1:8b-instruct-q4_K_M PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|eot_id|>" PARAMETER num_ctx 4096 PARAMETER temperature 0.3 PARAMETER top_p 0.9 TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|> {{ .Content }}<|eot_id|> {{- end }}<|start_header_id|>assistant<|end_header_id|> " ``` ### Why Q4_K_M not Q8_0 Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality difference at 8B scale. ### Why no adapter 44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format. ### Ollama inventory (current) ``` activeblue-avc:latest 366a6cc15bb7 4.9GB production llama3.1:8b-instruct-q4_K_M 46e0c10c039e 4.9GB base nomic-embed-text:latest 0a109f422b47 274MB embeddings ``` ### Phase 5 training note Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF: ```bash # Phase 5 only — do not run now huggingface-cli download meta-llama/Llama-3.1-8B-Instruct # ~16GB on disk, separate from Ollama storage ``` --- ## Build Phases Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete. ### Phase 1 — Reliable call loop **Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not the caller. - [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py` - [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py` - [ ] Apply Change 3: update `.env` - [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller) - [ ] Verify `AudioHeartbeat` diagnostic logging active - [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works **Gate — all five must pass:** 1. 10 consecutive test calls — zero silent non-responses 2. Zero zombie pipeline instances after call ends (`docker stats`) 3. Call termination from AVC side confirmed in Twilio call logs 4. JSON parse failure rate visible in logs — measurable not invisible 5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio ### Phase 2 — Accuracy (RAG + validation) - [ ] Populate `rag/data/*.jsonl` with real AVC data (human task — see RAG section) - [ ] ChromaDB RAG retriever wired into pipeline - [ ] Response validator: JSON schema + factual cross-check + PHI leak scan - [ ] Keyword blocklist (uncertainty phrases → handoff) - [ ] Intent classifier routing - [ ] Turn counter: max 3 failed turns before forced handoff + termination **Gate:** 20 manual test calls, zero hallucinations on AVC-specific facts ### Phase 3 — Booking - [ ] Real-time calendar availability check (`odoo/calendar.py`) - [ ] Whisper large-v3 post-call transcription (`recording/transcriber.py`) - [ ] Recording + transcript attached to Odoo lead chatter - [ ] Staff review flow confirmed in Odoo **Gate:** Staff receives, reviews, and confirms a lead end-to-end ### Phase 4 — Monitoring - [ ] Transcript index (`recordings/index.jsonl`) - [ ] Claude monitoring job - [ ] Dashboard: toggle, alert queue, one-click apply, playback, quality tagging **Gate:** First monitoring run produces actionable suggestions ### Phase 5 — Fine-tuning - [ ] Pull HuggingFace base (see model section) - [ ] Synthetic data generation via Claude API in JSON output format - [ ] Real call exporter using staff quality tags - [ ] Axolotl QLoRA on RTX 5080 - [ ] Model registry + versioning + A/B routing **Gate:** New model outperforms baseline over 50+ calls --- ## Repository Structure ``` avc-phone-ai/ ├── CLAUDE.md ← this file ├── README.md ├── .env ← never committed ├── .env.example ├── .gitignore ← includes .env, recordings/, *.gguf │ ├── bot.py ← Pipecat pipeline (Phase 1 changes here) ├── server.py ← Twilio webhook server (Phase 1 changes here) ├── practice.py ← AVC facts + Odoo persistence ├── extract.py ← post-call appointment extraction ├── odoo_client.py ← Odoo XML-RPC client │ ├── rag/ ← Phase 2 │ ├── store.py │ ├── loader.py │ ├── retriever.py │ └── data/ │ ├── avc_locations.jsonl │ ├── avc_providers.jsonl │ ├── avc_services.jsonl │ ├── avc_hours.jsonl │ ├── avc_insurance.jsonl │ └── avc_faqs.jsonl │ ├── recording/ ← Phase 3 │ ├── transcriber.py ← Whisper large-v3 post-call only │ └── storage.py │ ├── monitoring/ ← Phase 4 │ ├── monitor.py │ ├── analyzer.py │ ├── diff_engine.py │ ├── scheduler.py │ └── dashboard/ │ ├── app.py │ └── static/ │ ├── training/ ← Phase 5 stub │ └── README.md │ ├── tests/ │ ├── test_bot.py │ ├── test_server.py │ ├── test_odoo_client.py │ ├── test_extract.py │ └── fixtures/ │ └── sample_transcripts.jsonl │ ├── scripts/ │ ├── deploy.sh │ └── smoke_test.sh │ ├── avc-phone.service ← existing systemd unit └── traefik-avc-phone.yml ← existing Traefik config ``` --- ## Infrastructure | Component | Host | Address | Notes | |-----------|------|---------|-------| | Pipecat pipeline | `miaai` | `10.10.1.221` | Python async, systemd | | Ollama LLM | `miaai` | `http://127.0.0.1:11434/v1` | `activeblue-avc:latest` | | ChromaDB (Phase 2) | `miaai` | `http://10.10.1.221:8001` | Docker volume | | Twilio webhook | `miaai` | `https://avc-phone.activeblue.net` | Traefik + Let's Encrypt | | Monitoring dashboard | `miaai` | `https://avc-monitor.activeblue.net` | internal only | | Odoo CRM | — | `https://avc.activeblue.net` | XML-RPC, db: `avc` | | Recordings | `miaai` | `/home/tocmo0nlord/avc-phone/recordings/` | local only | | Gitea | — | `https://git.activeblue.net/tocmo0nlord/avc-phone-ai` | user: `tocmo0nlord` | --- ## RAG Store (Phase 2) **Stack:** ChromaDB + `nomic-embed-text:latest` (already in Ollama) **Collection:** `avc_knowledge` **Retrieval:** Top-3 chunks per query on caller's current turn only ### JSONL record format ```json { "id": "hours-kendall-weekday", "text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.", "tags": ["hours", "kendall"], "last_updated": "2026-06-23" } ``` ### Data files — populated before Phase 2, not before Phase 1 | File | Content | |------|---------| | `avc_locations.jsonl` | Address, phone, fax, parking per location | | `avc_providers.jsonl` | Name, title, specialty, locations, languages | | `avc_services.jsonl` | Exam types, procedures | | `avc_hours.jsonl` | Hours per location, holiday closures, after-hours | | `avc_insurance.jsonl` | Accepted plans per location | | `avc_faqs.jsonl` | Approved Q&A pairs | **Note:** `practice.py` already contains real AVC location and insurance data scraped from `advancedvisioncareflorida.com`. Use it as the seed for the JSONL files rather than starting from scratch. --- ## Claude Monitoring (Phase 4) ### What it analyzes - Facts stated by AVA contradicting RAG store - System prompt violations - Calls that should have been handoffs - High failed turn counts — model or prompt signal - RAG gaps (AVA said "I don't have that" — should it be added?) - Phrasing that caused caller confusion ### Output schema ```json { "call_sid": "CA...", "severity": "high", "issue_type": "factual_error", "description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.", "suggested_action": "rag_update", "suggested_change": { "file": "rag/data/avc_hours.jsonl", "record_id": "hours-kendall-weekday", "field": "text", "old": "...open until 6pm...", "new": "...open until 5pm..." } } ``` `suggested_action`: `rag_update` | `prompt_change` | `blocklist_add` | `flag_for_review` ### Dashboard FastAPI + HTML/JS at `https://avc-monitor.activeblue.net` (internal only). | Feature | Description | |---------|-------------| | Enable/disable toggle | Pauses scheduler without redeployment | | Alert queue | Suggestions sorted by severity | | One-click apply | Applies change, commits via Gitea API to `avc-phone-ai` | | Call playback | Audio + transcript side-by-side | | Quality tagging | Staff tags calls from dashboard | | Manual trigger | `POST /monitor/run` | --- ## Fine-Tuning Pipeline (Phase 5 — stub) > Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks. > See `training/README.md` — populated at Phase 5 start. - Synthetic data: Claude API generates Q&A in JSON output format — schema not style - Real calls: staff-tagged `"good"` + corrected bad calls - Target: 500+ pairs per intent before first Axolotl run - QLoRA via Axolotl on RTX 5080, base: HuggingFace `meta-llama/Llama-3.1-8B-Instruct` - Versioned Ollama models: `activeblue-avc:vN` - A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls --- ## HIPAA and Compliance - AVA identifies as automated at call start — no exceptions - No PHI in ChromaDB — practice information only - Recordings on `miaai` only — no cloud storage - Odoo API user: minimum permissions, not admin - All endpoints HTTPS via Traefik - `.env` never committed --- ## Deploy Script (`scripts/deploy.sh`) ```bash #!/bin/bash set -e cd /home/tocmo0nlord/avc-phone git pull origin main pip install -r requirements.txt --quiet systemctl restart avc-phone systemctl status avc-phone --no-pager echo "[deploy] Done." ``` --- ## Development Conventions - Python 3.13 (matches `miaai` miniconda environment) - Async throughout — Pipecat is async-native - `loguru` for all logging — already in use, keep consistent - Structured log lines for all diagnostic events - `python-dotenv` for local dev, env injection in prod - Secrets never hardcoded - Every module has `if __name__ == "__main__":` for isolated testing --- ## Key Dependencies (current) ``` pipecat-ai==1.3.0 # installed at /opt/miniconda3 pipecat-ai[deepgram] # add for Phase 1 Deepgram swap deepgram-sdk # add for Phase 1 kokoro-tts # already installed ollama # already installed scipy / numpy # already installed (pipecat deps) chromadb # add for Phase 2 sentence-transformers # add for Phase 2 anthropic # for monitoring + optional LLM swap openai-whisper # retained for post-call transcription only fastapi / uvicorn # already installed loguru # already installed httpx # already installed ``` --- ## Open Items - [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console - [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env` - [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db - [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live - [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data) - [ ] Define Odoo confirmed appointment flow: lead → opportunity → calendar event - [ ] Staff training on monitoring dashboard quality tagging --- *Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-ai*