commit 4bf72b9616ed53e0e084ac58053a2ddf5717350a Author: tocmo0nlord Date: Tue Jun 23 20:45:56 2026 +0000 Upload files to "/" diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..7bb63f8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,651 @@ +# AVC Phone Agent — Project Specification +> Claude Code authoritative reference. All architecture, security, and build decisions live here. +> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-agent` +> Last updated: 2026-06-23 | Active Blue LLC + +--- + +## Project Overview + +**Name:** AVC Phone Agent +**Owner:** Active Blue LLC +**Client:** Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX) +**Agent name:** AVA (Advanced Vision Assistant) +**Purpose:** Automated AI phone agent that answers patient calls, books tentative appointments +into Odoo CRM with call recordings and transcripts attached, and self-improves via +Claude-powered transcript monitoring and a fine-tuning feedback loop. + +--- + +## Existing Codebase — What to Keep, What to Change + +The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation. +**Do not rewrite what works.** Apply only the changes documented in this section. + +### Files and their status + +| File | Status | Action | +|------|--------|--------| +| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 | +| `server.py` | Keep with one change | Swap Auth Token for API Key Secret | +| `practice.py` | Keep as-is | No changes | +| `extract.py` | Keep as-is | No changes | +| `odoo_client.py` | Keep as-is | Already uses API key auth correctly | + +### What is already solved — do not touch + +**`EndCallProcessor` in `bot.py`** — AVC-side call termination is fully implemented. +Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via +`BotStoppedSpeakingFrame`, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer` +with `auto_hang_up` drops the carrier leg. This is correct. Zero changes. + +**Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`. +`PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly. +No custom audio module needed. + +**VAD tuned for telephony** — `confidence=0.5`, `min_volume=0.3` already loosened from +desktop defaults. These settings directly address the repeat-yourself problem on the +VAD side. + +**Capacity gating** — `MAX_CONCURRENT_CALLS=2` with atomic slot reservation in +`server.py` prevents GPU thrashing. Keep it. + +**`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from +transport stall. Keep it. + +**Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends. +Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model +output, falls back to JSONL if Odoo is unreachable. Keep it. + +**Odoo integration (`odoo_client.py`)** — already uses `ODOO_API_KEY` for XML-RPC auth, +not password. Correct pattern. No changes. + +--- + +## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`) + +**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering +before the LLM sees any input. This is the primary cause of non-reply and the +repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers +end-of-utterance events in under 300ms. + +**Remove from `bot.py`:** +```python +# Remove this import +from pipecat.services.whisper.stt import WhisperSTTService + +# Remove these env vars +WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base") +WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") +WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16") +WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") + +# Remove the entire HintedWhisperSTTService class +``` + +**Add to `bot.py`:** +```python +# Add import +from pipecat.services.deepgram.stt import DeepgramSTTService + +# Add env var +DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "") + +# Replace stt instantiation in run_agent() +stt = DeepgramSTTService( + api_key=DEEPGRAM_API_KEY, + settings=DeepgramSTTService.Settings( + model="nova-2", + language="en-US", + smart_format=True, + punctuate=True, + interim_results=False, # final transcripts only — avoids double-firing + utterance_end_ms=1000, # ms of silence before end-of-utterance fires + ) +) +``` + +**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained +for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does +not matter and accuracy is more important than speed. + +--- + +## Change 2 — Swap Auth Token for API Key Secret (`server.py`) + +**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account. +A leak compromises every Twilio integration. A Standard API Key is scoped to this +application and revocable independently. + +**Credential hierarchy:** +``` +Twilio Account SID (not secret on its own) +├── Auth Token (master — Twilio console only, rotate quarterly) +└── API Key: avc-phone-agent-prod (Standard scope) + ├── TWILIO_API_KEY_SID: SK... + └── TWILIO_API_KEY_SECRET: (treat as a password) +``` + +**Create the API Key:** +1. Twilio console → Account → API Keys → Create new Standard key +2. Name it `avc-phone-agent-prod` +3. Copy SID (`SK...`) and Secret — Secret is shown once only + +**Changes in `server.py`:** + +Remove: +```python +TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN") +``` + +Add: +```python +TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID") +TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET") +``` + +In `_twilio_signature_ok()`, change the HMAC key: +```python +# Before +digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest() + +# After +digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest() +``` + +Update the guard condition: +```python +# Before +if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN: + +# After +if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET: +``` + +Update the warning log: +```python +# Before +elif not TWILIO_AUTH_TOKEN: + logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)") + +# After +elif not TWILIO_API_KEY_SECRET: + logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)") +``` + +In `TwilioFrameSerializer` instantiation: +```python +# Before +serializer = TwilioFrameSerializer( + stream_sid=stream_sid, + call_sid=call_sid, + account_sid=TWILIO_ACCOUNT_SID, + auth_token=TWILIO_AUTH_TOKEN, +) + +# After +serializer = TwilioFrameSerializer( + stream_sid=stream_sid, + call_sid=call_sid, + account_sid=TWILIO_ACCOUNT_SID, + auth_token=TWILIO_API_KEY_SECRET, +) +``` + +**Key rotation procedure:** +1. Create new Standard API Key in Twilio console +2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env` +3. Restart the service — no rebuild needed +4. Verify one test call succeeds +5. Revoke old key in Twilio console + +Rotate on: any suspected leak, any team member departure, quarterly as routine. + +--- + +## Change 3 — Update `.env` + +**Remove:** +```env +TWILIO_AUTH_TOKEN= +``` + +**Add:** +```env +TWILIO_API_KEY_SID=SK... +TWILIO_API_KEY_SECRET= +DEEPGRAM_API_KEY= +``` + +**Full `.env` reference:** +```env +# Twilio — Auth Token lives in Twilio console only, never on this server +TWILIO_ACCOUNT_SID=AC... +TWILIO_API_KEY_SID=SK... +TWILIO_API_KEY_SECRET= +TWILIO_PHONE_NUMBER=+1... + +# STT: Deepgram (real-time, in-call only) +DEEPGRAM_API_KEY= +DEEPGRAM_MODEL=nova-2 + +# LLM: Ollama +OLLAMA_URL=http://127.0.0.1:11434/v1 +OLLAMA_MODEL=activeblue-avc:latest +LLM_PROVIDER=ollama +LLM_TEMPERATURE=0.3 +LLM_MAX_TOKENS=160 + +# Anthropic (optional LLM swap + monitoring + synthetic data) +ANTHROPIC_API_KEY= +ANTHROPIC_MODEL=claude-sonnet-4-6 + +# TTS: Kokoro +KOKORO_VOICE=af_heart +KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models + +# Odoo +ODOO_URL=https://avc.activeblue.net +ODOO_DB=avc +ODOO_USER= +ODOO_API_KEY= +ODOO_TARGET=crm +ODOO_STAGE_ID= +ODOO_TEAM_ID= +ODOO_USER_ID= + +# Server +PUBLIC_HOST=avc-phone.activeblue.net +PORT=8200 +BIND_HOST=127.0.0.1 +MAX_CONCURRENT_CALLS=2 +STREAM_TOKEN= + +# Call behaviour +AGENT_NAME=AVA +ENABLE_TOOLS= +VAD_CONFIDENCE=0.5 +VAD_MIN_VOLUME=0.3 +VAD_START_SECS=0.2 +VAD_STOP_SECS=0.5 + +# Monitoring (Phase 4) +MONITORING_ENABLED=true +MONITORING_SCHEDULE=0 2 * * * + +# A/B model routing (Phase 5 only) +AB_SPLIT_PERCENT=0 +AB_MODEL_B= +``` + +--- + +## Model Configuration + +### Current production model: `activeblue-avc:latest` + +| Property | Value | Notes | +|----------|-------|-------| +| Base | `llama3.1:8b-instruct-q4_K_M` | Llama 3.1 8B, Q4_K_M quantization | +| ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 | +| Size | 4.9GB | Down from 8.7GB Q8_0 | +| VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 | +| Context | 4096 tokens | Sufficient for any phone call | +| Temperature | 0.3 | Low — maximizes JSON schema compliance | +| Top-p | 0.9 | Standard | +| Adapter | None | 44-pair LoRA adapter discarded | + +### Modelfile (rebuild reference) + +``` +FROM llama3.1:8b-instruct-q4_K_M + +PARAMETER stop "<|start_header_id|>" +PARAMETER stop "<|end_header_id|>" +PARAMETER stop "<|eot_id|>" +PARAMETER num_ctx 4096 +PARAMETER temperature 0.3 +PARAMETER top_p 0.9 + +TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|> +{{ .Content }}<|eot_id|> +{{- end }}<|start_header_id|>assistant<|end_header_id|> +" +``` + +### Why Q4_K_M not Q8_0 + +Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused +inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality +difference at 8B scale. + +### Why no adapter + +44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs +per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format. + +### Ollama inventory (current) + +``` +activeblue-avc:latest 366a6cc15bb7 4.9GB production +llama3.1:8b-instruct-q4_K_M 46e0c10c039e 4.9GB base +nomic-embed-text:latest 0a109f422b47 274MB embeddings +``` + +### Phase 5 training note + +Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF: +```bash +# Phase 5 only — do not run now +huggingface-cli download meta-llama/Llama-3.1-8B-Instruct +# ~16GB on disk, separate from Ollama storage +``` + +--- + +## Build Phases + +Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete. + +### Phase 1 — Reliable call loop + +**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not +the caller. + +- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py` +- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py` +- [ ] Apply Change 3: update `.env` +- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller) +- [ ] Verify `AudioHeartbeat` diagnostic logging active +- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works + +**Gate — all five must pass:** +1. 10 consecutive test calls — zero silent non-responses +2. Zero zombie pipeline instances after call ends (`docker stats`) +3. Call termination from AVC side confirmed in Twilio call logs +4. JSON parse failure rate visible in logs — measurable not invisible +5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio + +### Phase 2 — Accuracy (RAG + validation) + +- [ ] Populate `rag/data/*.jsonl` with real AVC data (human task — see RAG section) +- [ ] ChromaDB RAG retriever wired into pipeline +- [ ] Response validator: JSON schema + factual cross-check + PHI leak scan +- [ ] Keyword blocklist (uncertainty phrases → handoff) +- [ ] Intent classifier routing +- [ ] Turn counter: max 3 failed turns before forced handoff + termination + +**Gate:** 20 manual test calls, zero hallucinations on AVC-specific facts + +### Phase 3 — Booking + +- [ ] Real-time calendar availability check (`odoo/calendar.py`) +- [ ] Whisper large-v3 post-call transcription (`recording/transcriber.py`) +- [ ] Recording + transcript attached to Odoo lead chatter +- [ ] Staff review flow confirmed in Odoo + +**Gate:** Staff receives, reviews, and confirms a lead end-to-end + +### Phase 4 — Monitoring + +- [ ] Transcript index (`recordings/index.jsonl`) +- [ ] Claude monitoring job +- [ ] Dashboard: toggle, alert queue, one-click apply, playback, quality tagging + +**Gate:** First monitoring run produces actionable suggestions + +### Phase 5 — Fine-tuning + +- [ ] Pull HuggingFace base (see model section) +- [ ] Synthetic data generation via Claude API in JSON output format +- [ ] Real call exporter using staff quality tags +- [ ] Axolotl QLoRA on RTX 5080 +- [ ] Model registry + versioning + A/B routing + +**Gate:** New model outperforms baseline over 50+ calls + +--- + +## Repository Structure + +``` +avc-phone-agent/ +├── CLAUDE.md ← this file +├── README.md +├── .env ← never committed +├── .env.example +├── .gitignore ← includes .env, recordings/, *.gguf +│ +├── bot.py ← Pipecat pipeline (Phase 1 changes here) +├── server.py ← Twilio webhook server (Phase 1 changes here) +├── practice.py ← AVC facts + Odoo persistence +├── extract.py ← post-call appointment extraction +├── odoo_client.py ← Odoo XML-RPC client +│ +├── rag/ ← Phase 2 +│ ├── store.py +│ ├── loader.py +│ ├── retriever.py +│ └── data/ +│ ├── avc_locations.jsonl +│ ├── avc_providers.jsonl +│ ├── avc_services.jsonl +│ ├── avc_hours.jsonl +│ ├── avc_insurance.jsonl +│ └── avc_faqs.jsonl +│ +├── recording/ ← Phase 3 +│ ├── transcriber.py ← Whisper large-v3 post-call only +│ └── storage.py +│ +├── monitoring/ ← Phase 4 +│ ├── monitor.py +│ ├── analyzer.py +│ ├── diff_engine.py +│ ├── scheduler.py +│ └── dashboard/ +│ ├── app.py +│ └── static/ +│ +├── training/ ← Phase 5 stub +│ └── README.md +│ +├── tests/ +│ ├── test_bot.py +│ ├── test_server.py +│ ├── test_odoo_client.py +│ ├── test_extract.py +│ └── fixtures/ +│ └── sample_transcripts.jsonl +│ +├── scripts/ +│ ├── deploy.sh +│ └── smoke_test.sh +│ +├── avc-phone.service ← existing systemd unit +└── traefik-avc-phone.yml ← existing Traefik config +``` + +--- + +## Infrastructure + +| Component | Host | Address | Notes | +|-----------|------|---------|-------| +| Pipecat pipeline | `miaai` | `10.10.1.221` | Python async, systemd | +| Ollama LLM | `miaai` | `http://127.0.0.1:11434/v1` | `activeblue-avc:latest` | +| ChromaDB (Phase 2) | `miaai` | `http://10.10.1.221:8001` | Docker volume | +| Twilio webhook | `miaai` | `https://avc-phone.activeblue.net` | Traefik + Let's Encrypt | +| Monitoring dashboard | `miaai` | `https://avc-monitor.activeblue.net` | internal only | +| Odoo CRM | — | `https://avc.activeblue.net` | XML-RPC, db: `avc` | +| Recordings | `miaai` | `/home/tocmo0nlord/avc-phone/recordings/` | local only | +| Gitea | — | `https://git.activeblue.net` | user: `tocmo0nlord` | + +--- + +## RAG Store (Phase 2) + +**Stack:** ChromaDB + `nomic-embed-text:latest` (already in Ollama) +**Collection:** `avc_knowledge` +**Retrieval:** Top-3 chunks per query on caller's current turn only + +### JSONL record format + +```json +{ + "id": "hours-kendall-weekday", + "text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.", + "tags": ["hours", "kendall"], + "last_updated": "2026-06-23" +} +``` + +### Data files — populated before Phase 2, not before Phase 1 + +| File | Content | +|------|---------| +| `avc_locations.jsonl` | Address, phone, fax, parking per location | +| `avc_providers.jsonl` | Name, title, specialty, locations, languages | +| `avc_services.jsonl` | Exam types, procedures | +| `avc_hours.jsonl` | Hours per location, holiday closures, after-hours | +| `avc_insurance.jsonl` | Accepted plans per location | +| `avc_faqs.jsonl` | Approved Q&A pairs | + +**Note:** `practice.py` already contains real AVC location and insurance data scraped +from `advancedvisioncareflorida.com`. Use it as the seed for the JSONL files rather +than starting from scratch. + +--- + +## Claude Monitoring (Phase 4) + +### What it analyzes + +- Facts stated by AVA contradicting RAG store +- System prompt violations +- Calls that should have been handoffs +- High failed turn counts — model or prompt signal +- RAG gaps (AVA said "I don't have that" — should it be added?) +- Phrasing that caused caller confusion + +### Output schema + +```json +{ + "call_sid": "CA...", + "severity": "high", + "issue_type": "factual_error", + "description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.", + "suggested_action": "rag_update", + "suggested_change": { + "file": "rag/data/avc_hours.jsonl", + "record_id": "hours-kendall-weekday", + "field": "text", + "old": "...open until 6pm...", + "new": "...open until 5pm..." + } +} +``` + +`suggested_action`: `rag_update` | `prompt_change` | `blocklist_add` | `flag_for_review` + +### Dashboard + +FastAPI + HTML/JS at `https://avc-monitor.activeblue.net` (internal only). + +| Feature | Description | +|---------|-------------| +| Enable/disable toggle | Pauses scheduler without redeployment | +| Alert queue | Suggestions sorted by severity | +| One-click apply | Applies change, commits via Gitea API | +| Call playback | Audio + transcript side-by-side | +| Quality tagging | Staff tags calls from dashboard | +| Manual trigger | `POST /monitor/run` | + +--- + +## Fine-Tuning Pipeline (Phase 5 — stub) + +> Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks. +> See `training/README.md` — populated at Phase 5 start. + +- Synthetic data: Claude API generates Q&A in JSON output format — schema not style +- Real calls: staff-tagged `"good"` + corrected bad calls +- Target: 500+ pairs per intent before first Axolotl run +- QLoRA via Axolotl on RTX 5080, base: HuggingFace `meta-llama/Llama-3.1-8B-Instruct` +- Versioned Ollama models: `activeblue-avc:vN` +- A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls + +--- + +## HIPAA and Compliance + +- AVA identifies as automated at call start — no exceptions +- No PHI in ChromaDB — practice information only +- Recordings on `miaai` only — no cloud storage +- Odoo API user: minimum permissions, not admin +- All endpoints HTTPS via Traefik +- `.env` never committed + +--- + +## Deploy Script (`scripts/deploy.sh`) + +```bash +#!/bin/bash +set -e +cd /home/tocmo0nlord/avc-phone +git pull origin main +pip install -r requirements.txt --quiet +systemctl restart avc-phone +systemctl status avc-phone --no-pager +echo "[deploy] Done." +``` + +--- + +## Development Conventions + +- Python 3.13 (matches `miaai` miniconda environment) +- Async throughout — Pipecat is async-native +- `loguru` for all logging — already in use, keep consistent +- Structured log lines for all diagnostic events +- `python-dotenv` for local dev, env injection in prod +- Secrets never hardcoded +- Every module has `if __name__ == "__main__":` for isolated testing + +--- + +## Key Dependencies (current) + +``` +pipecat-ai==1.3.0 # installed at /opt/miniconda3 +pipecat-ai[deepgram] # add for Phase 1 Deepgram swap +deepgram-sdk # add for Phase 1 +kokoro-tts # already installed +ollama # already installed +scipy / numpy # already installed (pipecat deps) +chromadb # add for Phase 2 +sentence-transformers # add for Phase 2 +anthropic # for monitoring + optional LLM swap +openai-whisper # retained for post-call transcription only +fastapi / uvicorn # already installed +loguru # already installed +httpx # already installed +``` + +--- + +## Open Items + +- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console +- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env` +- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db +- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live +- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data) +- [ ] Define Odoo confirmed appointment flow: lead → opportunity → calendar event +- [ ] Staff training on monitoring dashboard quality tagging + +--- + +*Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-agent*