# AVC Phone Agent — Project Specification > Claude Code authoritative reference. All architecture, security, and build decisions live here. > Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai` > Last updated: 2026-06-25 | Active Blue LLC --- ## Project Overview **Name:** AVC Phone Agent **Owner:** Active Blue LLC **Client:** Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX) **Agent name:** AVA (Advanced Vision Assistant) **Purpose:** Automated AI phone agent that answers patient calls, books tentative appointments into Odoo CRM with call recordings and transcripts attached, and self-improves via Claude-powered transcript monitoring and a fine-tuning feedback loop. --- ## Existing Codebase — What to Keep, What to Change The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation. **Do not rewrite what works.** Apply only the changes documented in this section. ### Files and their status | File | Status | Action | |------|--------|--------| | `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 | | `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 | | `practice.py` | Keep as-is | No changes | | `extract.py` | Keep as-is | No changes | | `odoo_client.py` | Keep as-is | Already uses API key auth correctly | ### What is already solved — do not touch **`EndCallProcessor` in `bot.py`** — AVC-side call termination is fully implemented. Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via `BotStoppedSpeakingFrame`, pauses `HANGUP_DELAY_SECS` (default 4s) so the caller isn't clipped, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` with `auto_hang_up` drops the carrier leg. Verified working in the Phase 1 gate (4/4 clean hang-ups). **Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`. `PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly. No custom audio module needed. **VAD tuned for telephony** — `confidence=0.5`, `min_volume=0.3` already loosened from desktop defaults. These settings directly address the repeat-yourself problem on the VAD side. **Capacity gating** — `MAX_CONCURRENT_CALLS=2` with atomic slot reservation in `server.py` prevents GPU thrashing. Keep it. **`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from transport stall. Keep it. **Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends. Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model output, falls back to JSONL if Odoo is unreachable. Keep it. **Odoo integration (`odoo_client.py`)** — already uses `ODOO_API_KEY` for XML-RPC auth, not password. Correct pattern. No changes. --- ## Change 1 — Real-time STT stays on Whisper (`bot.py`) **Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.** Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM sees input). The swap was applied and then reverted — the project stays on local faster-whisper. No external STT dependency, no per-minute STT cost, and no audio leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the `medium` model on the RTX 5080. **Current `bot.py` STT (in place — do not change):** ```python from pipecat.services.whisper.stt import WhisperSTTService WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080 WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16") WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias # HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords` # (office cities + optometry terms) per call. Instantiated in run_agent(): stt = HintedWhisperSTTService( settings=WhisperSTTService.Settings(model=WHISPER_MODEL), device=WHISPER_DEVICE, compute_type=WHISPER_COMPUTE, hotwords=WHISPER_HOTWORDS, ) ``` **Note:** Whisper large-v3 also serves post-call transcription in Phase 3 (`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1 gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively. --- ## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`) **Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.** A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token, but it **cannot do what this server needs**: Twilio signs inbound webhooks (`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice` (403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token credential pair. The swap was reverted. **Credential model (in place):** ``` Twilio Account SID (not secret on its own) └── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up) ``` Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on any suspected leak / team departure / quarterly. If finer-grained scoping is ever required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature` validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap. **Current `server.py` (in place — do not change):** ```python TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID") TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN") # _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with) digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest() # Validation gate + warning if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN: ... elif not TWILIO_AUTH_TOKEN: logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)") # Serializer auto-hang-up uses the account SID + Auth Token pair serializer = TwilioFrameSerializer( stream_sid=stream_sid, call_sid=call_sid, account_sid=TWILIO_ACCOUNT_SID, auth_token=TWILIO_AUTH_TOKEN, ) ``` **Auth Token rotation procedure:** 1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow) 2. Update `TWILIO_AUTH_TOKEN` in `.env` 3. Restart the service — no rebuild needed 4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it) 5. Retire the old token in the Twilio console Rotate on: any suspected leak, any team member departure, quarterly as routine. --- ## Change 3 — `.env` No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no** `TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2). **Full `.env` reference:** ```env # Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed. TWILIO_ACCOUNT_SID=AC... TWILIO_AUTH_TOKEN= TWILIO_PHONE_NUMBER=+1... TWILIO_VALIDATE=true # STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3) WHISPER_MODEL=medium WHISPER_DEVICE=cuda WHISPER_COMPUTE=float16 # LLM: Ollama OLLAMA_URL=http://127.0.0.1:11434/v1 OLLAMA_MODEL=activeblue-avc:latest LLM_PROVIDER=ollama LLM_TEMPERATURE=0.3 LLM_MAX_TOKENS=160 # Anthropic (optional LLM swap + monitoring + synthetic data) ANTHROPIC_API_KEY= ANTHROPIC_MODEL=claude-sonnet-4-6 # TTS: Kokoro KOKORO_VOICE=af_heart KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models # Odoo ODOO_URL=https://avc.activeblue.net ODOO_DB=avc ODOO_USER= ODOO_API_KEY= ODOO_TARGET=crm ODOO_STAGE_ID= ODOO_TEAM_ID= ODOO_USER_ID= # Server PUBLIC_HOST=avc-phone.activeblue.net PORT=8200 BIND_HOST=127.0.0.1 MAX_CONCURRENT_CALLS=2 STREAM_TOKEN= # Call behaviour AGENT_NAME=AVA HANGUP_DELAY_SECS=4.0 # grace pause after the goodbye before dropping the carrier leg ENABLE_TOOLS= VAD_CONFIDENCE=0.5 VAD_MIN_VOLUME=0.3 VAD_START_SECS=0.2 VAD_STOP_SECS=0.5 # Monitoring (Phase 4) MONITORING_ENABLED=true MONITORING_SCHEDULE=0 2 * * * # A/B model routing (Phase 5 only) AB_SPLIT_PERCENT=0 AB_MODEL_B= ``` --- ## Call Data Capture & Date Validation What AVA collects on a booking call and how it's logged. Driven by the system prompt (`bot.py`) plus a per-call calendar injection; persisted by the post-call extractor (`extract.py` → `practice.py` → Odoo lead). ### The six captured fields | Field | In-call behavior | Logged as | |-------|------------------|-----------| | Full name | Asks for last name if only a first is given | `patient_name` / lead `contact_name` | | Phone | Reads back the caller-ID number; if the caller declines, uses the number they give | `callback_number` (+ `phone_confirmed`) | | Office / city | Asks city/area; never names an office unprompted | folded into `reason` prefix | | Reason | Captured from the conversation | `reason` | | Insurance | **Log only** — asks the plan, never promises/confirms/denies coverage or treatment (even a listed plan); staff verify on callback | `insurance` (note: "log only — staff to verify") | | Preferred date & time | Validated against the calendar (below); confirmed before booking | `preferred_time` + resolved `YYYY-MM-DD` | ### Date validation Each call's system message is injected with an **authoritative 45-day calendar** (today + each upcoming date with its weekday), recomputed per call since the server is long-running (`_date_context()` in `bot.py`). AVA must check any date the caller names against it. On an impossible or weekday/number-mismatched date it pushes back and offers the correct one, e.g. > "Next month, Monday lands on the sixth — would you like to schedule that date?" The extractor is also given today's date so it can resolve relative phrasing ("next month, Monday") to a concrete `YYYY-MM-DD`; the caller's own words are always kept too. **Reliability note:** this is the local 8B model reasoning over injected facts — accurate in testing but not guaranteed turn-to-turn. A deterministic date-resolver tool would harden it, but in-call tools stay off for this model (it leaks JSON). Treat the resolved date as staff-verifiable, not authoritative. Overlaps Phase 2 validation work. --- ## Model Configuration ### Current production model: `activeblue-avc:latest` | Property | Value | Notes | |----------|-------|-------| | Base | `llama3.1:8b-instruct-q4_K_M` | Llama 3.1 8B, Q4_K_M quantization | | ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 | | Size | 4.9GB | Down from 8.7GB Q8_0 | | VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 | | Context | 4096 tokens | Sufficient for any phone call | | Temperature | 0.3 | Low — maximizes JSON schema compliance | | Top-p | 0.9 | Standard | | Adapter | None | 44-pair LoRA adapter discarded | ### Modelfile (rebuild reference) ``` FROM llama3.1:8b-instruct-q4_K_M PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|eot_id|>" PARAMETER num_ctx 4096 PARAMETER temperature 0.3 PARAMETER top_p 0.9 TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|> {{ .Content }}<|eot_id|> {{- end }}<|start_header_id|>assistant<|end_header_id|> " ``` ### Why Q4_K_M not Q8_0 Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality difference at 8B scale. ### Why no adapter 44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format. ### Ollama inventory (current) ``` activeblue-avc:latest 366a6cc15bb7 4.9GB production llama3.1:8b-instruct-q4_K_M 46e0c10c039e 4.9GB base nomic-embed-text:latest 0a109f422b47 274MB embeddings ``` ### Phase 5 training note Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF: ```bash # Phase 5 only — do not run now huggingface-cli download meta-llama/Llama-3.1-8B-Instruct # ~16GB on disk, separate from Ollama storage ``` --- ## Build Phases Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete. ### Phase 1 — Reliable call loop **Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not the caller. - [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`) - [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token - [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest` - [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller) - [ ] Verify `AudioHeartbeat` diagnostic logging active - [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works **Gate — all five must pass:** 1. 10 consecutive test calls — zero silent non-responses 2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare systemd/host process, not Docker) 3. Call termination from AVC side confirmed in Twilio call logs 4. JSON parse failure rate visible in logs — measurable not invisible 5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio ### Phase 2 — Accuracy (RAG + validation) - [ ] Populate `rag/data/*.jsonl` with real AVC data (human task — see RAG section) - [ ] ChromaDB RAG retriever wired into pipeline - [ ] Response validator: JSON schema + factual cross-check + PHI leak scan - [ ] Keyword blocklist (uncertainty phrases → handoff) - [ ] Intent classifier routing - [ ] Turn counter: max 3 failed turns before forced handoff + termination **Gate:** 20 manual test calls, zero hallucinations on AVC-specific facts ### Phase 3 — Booking - [ ] Real-time calendar availability check (`odoo/calendar.py`) - [ ] Whisper large-v3 post-call transcription (`recording/transcriber.py`) - [ ] Recording + transcript attached to Odoo lead chatter - [ ] Staff review flow confirmed in Odoo **Gate:** Staff receives, reviews, and confirms a lead end-to-end ### Phase 4 — Monitoring - [ ] Transcript index (`recordings/index.jsonl`) - [ ] Claude monitoring job - [ ] Dashboard: toggle, alert queue, one-click apply, playback, quality tagging **Gate:** First monitoring run produces actionable suggestions ### Phase 5 — Fine-tuning - [ ] Pull HuggingFace base (see model section) - [ ] Synthetic data generation via Claude API in JSON output format - [ ] Real call exporter using staff quality tags - [ ] Axolotl QLoRA on RTX 5080 - [ ] Model registry + versioning + A/B routing **Gate:** New model outperforms baseline over 50+ calls --- ## Repository Structure ``` avc-phone-ai/ ├── CLAUDE.md ← this file ├── README.md ├── .env ← never committed ├── .env.example ├── .gitignore ← includes .env, recordings/, *.gguf │ ├── bot.py ← Pipecat pipeline (Phase 1 changes here) ├── server.py ← Twilio webhook server (Phase 1 changes here) ├── practice.py ← AVC facts + Odoo persistence ├── extract.py ← post-call appointment extraction ├── odoo_client.py ← Odoo XML-RPC client │ ├── rag/ ← Phase 2 │ ├── store.py │ ├── loader.py │ ├── retriever.py │ └── data/ │ ├── avc_locations.jsonl │ ├── avc_providers.jsonl │ ├── avc_services.jsonl │ ├── avc_hours.jsonl │ ├── avc_insurance.jsonl │ └── avc_faqs.jsonl │ ├── recording/ ← Phase 3 │ ├── transcriber.py ← Whisper large-v3 post-call only │ └── storage.py │ ├── monitoring/ ← Phase 4 │ ├── monitor.py │ ├── analyzer.py │ ├── diff_engine.py │ ├── scheduler.py │ └── dashboard/ │ ├── app.py │ └── static/ │ ├── training/ ← Phase 5 stub │ └── README.md │ ├── tests/ │ ├── test_bot.py │ ├── test_server.py │ ├── test_odoo_client.py │ ├── test_extract.py │ └── fixtures/ │ └── sample_transcripts.jsonl │ ├── scripts/ │ ├── deploy.sh │ └── smoke_test.sh │ ├── avc-phone.service ← existing systemd unit └── traefik-avc-phone.yml ← existing Traefik config ``` --- ## Infrastructure | Component | Host | Address | Notes | |-----------|------|---------|-------| | Pipecat pipeline | `miaai` | `10.10.1.221` | Python async, systemd | | Ollama LLM | `miaai` | `http://127.0.0.1:11434/v1` | `activeblue-avc:latest` | | ChromaDB (Phase 2) | `miaai` | `http://10.10.1.221:8001` | Docker volume | | Twilio webhook | `miaai` | `https://avc-phone.activeblue.net` | Traefik + Let's Encrypt | | Monitoring dashboard | `miaai` | `https://avc-monitor.activeblue.net` | internal only | | Odoo CRM | — | `https://avc.activeblue.net` | XML-RPC, db: `avc` | | Recordings | `miaai` | `/home/tocmo0nlord/avc-phone/recordings/` | local only | | Gitea | — | `https://git.activeblue.net/tocmo0nlord/avc-phone-ai` | user: `tocmo0nlord` | --- ## RAG Store (Phase 2) **Stack:** ChromaDB + `nomic-embed-text:latest` (already in Ollama) **Collection:** `avc_knowledge` **Retrieval:** Top-3 chunks per query on caller's current turn only ### JSONL record format ```json { "id": "hours-kendall-weekday", "text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.", "tags": ["hours", "kendall"], "last_updated": "2026-06-23" } ``` ### Data files — populated before Phase 2, not before Phase 1 | File | Content | |------|---------| | `avc_locations.jsonl` | Address, phone, fax, parking per location | | `avc_providers.jsonl` | Name, title, specialty, locations, languages | | `avc_services.jsonl` | Exam types, procedures | | `avc_hours.jsonl` | Hours per location, holiday closures, after-hours | | `avc_insurance.jsonl` | Accepted plans per location | | `avc_faqs.jsonl` | Approved Q&A pairs | **Note:** `practice.py` already contains real AVC location and insurance data scraped from `advancedvisioncareflorida.com`. Use it as the seed for the JSONL files rather than starting from scratch. --- ## Claude Monitoring (Phase 4) ### What it analyzes - Facts stated by AVA contradicting RAG store - System prompt violations - Calls that should have been handoffs - High failed turn counts — model or prompt signal - RAG gaps (AVA said "I don't have that" — should it be added?) - Phrasing that caused caller confusion ### Output schema ```json { "call_sid": "CA...", "severity": "high", "issue_type": "factual_error", "description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.", "suggested_action": "rag_update", "suggested_change": { "file": "rag/data/avc_hours.jsonl", "record_id": "hours-kendall-weekday", "field": "text", "old": "...open until 6pm...", "new": "...open until 5pm..." } } ``` `suggested_action`: `rag_update` | `prompt_change` | `blocklist_add` | `flag_for_review` ### Dashboard FastAPI + HTML/JS at `https://avc-monitor.activeblue.net` (internal only). | Feature | Description | |---------|-------------| | Enable/disable toggle | Pauses scheduler without redeployment | | Alert queue | Suggestions sorted by severity | | One-click apply | Applies change, commits via Gitea API to `avc-phone-ai` | | Call playback | Audio + transcript side-by-side | | Quality tagging | Staff tags calls from dashboard | | Manual trigger | `POST /monitor/run` | --- ## Fine-Tuning Pipeline (Phase 5 — stub) > Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks. > See `training/README.md` — populated at Phase 5 start. - Synthetic data: Claude API generates Q&A in JSON output format — schema not style - Real calls: staff-tagged `"good"` + corrected bad calls - Target: 500+ pairs per intent before first Axolotl run - QLoRA via Axolotl on RTX 5080, base: HuggingFace `meta-llama/Llama-3.1-8B-Instruct` - Versioned Ollama models: `activeblue-avc:vN` - A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls --- ## HIPAA and Compliance - AVA identifies as automated at call start — no exceptions - No PHI in ChromaDB — practice information only - Recordings on `miaai` only — no cloud storage - Odoo API user: minimum permissions, not admin - All endpoints HTTPS via Traefik - `.env` never committed --- ## Deploy Script (`scripts/deploy.sh`) ```bash #!/bin/bash set -e cd /home/tocmo0nlord/avc-phone git pull origin main pip install -r requirements.txt --quiet systemctl restart avc-phone systemctl status avc-phone --no-pager echo "[deploy] Done." ``` --- ## Development Conventions - Python 3.13 (matches `miaai` miniconda environment) - Async throughout — Pipecat is async-native - `loguru` for all logging — already in use, keep consistent - Structured log lines for all diagnostic events - `python-dotenv` for local dev, env injection in prod - Secrets never hardcoded - Every module has `if __name__ == "__main__":` for isolated testing --- ## Key Dependencies (current) ``` pipecat-ai==1.3.0 # installed at /opt/miniconda3 faster-whisper # real-time STT (already installed in pipecat-run venv) kokoro-tts # already installed ollama # already installed scipy / numpy # already installed (pipecat deps) chromadb # add for Phase 2 sentence-transformers # add for Phase 2 anthropic # for monitoring + optional LLM swap openai-whisper # large-v3 for post-call transcription (Phase 3) fastapi / uvicorn # already installed loguru # already installed httpx # already installed ``` --- ## Open Items - [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale) - [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db - [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live - [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data) - [ ] Define Odoo confirmed appointment flow: lead → opportunity → calendar event - [ ] Staff training on monitoring dashboard quality tagging --- *Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-ai*