Files
avc-phone-ai/CLAUDE.md
2026-06-23 20:45:56 +00:00

20 KiB

AVC Phone Agent — Project Specification

Claude Code authoritative reference. All architecture, security, and build decisions live here. Repo: git.activeblue.net/tocmo0nlord/avc-phone-agent Last updated: 2026-06-23 | Active Blue LLC


Project Overview

Name: AVC Phone Agent Owner: Active Blue LLC Client: Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX) Agent name: AVA (Advanced Vision Assistant) Purpose: Automated AI phone agent that answers patient calls, books tentative appointments into Odoo CRM with call recordings and transcripts attached, and self-improves via Claude-powered transcript monitoring and a fine-tuning feedback loop.


Existing Codebase — What to Keep, What to Change

The previous build at /home/tocmo0nlord/avc-phone/ is a working foundation. Do not rewrite what works. Apply only the changes documented in this section.

Files and their status

File Status Action
bot.py Keep with one change Swap Whisper STT for Deepgram Nova-2
server.py Keep with one change Swap Auth Token for API Key Secret
practice.py Keep as-is No changes
extract.py Keep as-is No changes
odoo_client.py Keep as-is Already uses API key auth correctly

What is already solved — do not touch

EndCallProcessor in bot.py — AVC-side call termination is fully implemented. Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via BotStoppedSpeakingFrame, then pushes EndTaskFrame upstream. TwilioFrameSerializer with auto_hang_up drops the carrier leg. This is correct. Zero changes.

Mulaw 8kHz ↔ 16kHz conversion — handled internally by TwilioFrameSerializer. PIPELINE_SAMPLE_RATE = 16000, WIRE_SAMPLE_RATE = 8000 are already set correctly. No custom audio module needed.

VAD tuned for telephonyconfidence=0.5, min_volume=0.3 already loosened from desktop defaults. These settings directly address the repeat-yourself problem on the VAD side.

Capacity gatingMAX_CONCURRENT_CALLS=2 with atomic slot reservation in server.py prevents GPU thrashing. Keep it.

AudioHeartbeat — diagnostic processor that distinguishes VAD failure from transport stall. Keep it.

Post-call extraction (extract.py) — single JSON-mode completion after call ends. Correctly uses format: json, uses verified Twilio caller-ID instead of trusting model output, falls back to JSONL if Odoo is unreachable. Keep it.

Odoo integration (odoo_client.py) — already uses ODOO_API_KEY for XML-RPC auth, not password. Correct pattern. No changes.


Change 1 — Swap Whisper STT for Deepgram Nova-2 (bot.py)

Why: Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering before the LLM sees any input. This is the primary cause of non-reply and the repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers end-of-utterance events in under 300ms.

Remove from bot.py:

# Remove this import
from pipecat.services.whisper.stt import WhisperSTTService

# Remove these env vars
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")

# Remove the entire HintedWhisperSTTService class

Add to bot.py:

# Add import
from pipecat.services.deepgram.stt import DeepgramSTTService

# Add env var
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")

# Replace stt instantiation in run_agent()
stt = DeepgramSTTService(
    api_key=DEEPGRAM_API_KEY,
    settings=DeepgramSTTService.Settings(
        model="nova-2",
        language="en-US",
        smart_format=True,
        punctuate=True,
        interim_results=False,      # final transcripts only — avoids double-firing
        utterance_end_ms=1000,      # ms of silence before end-of-utterance fires
    )
)

Note on Whisper: Remove from real-time pipeline only. Whisper large-v3 is retained for post-call transcription in Phase 3 (recording/transcriber.py) where latency does not matter and accuracy is more important than speed.


Change 2 — Swap Auth Token for API Key Secret (server.py)

Why: TWILIO_AUTH_TOKEN is the master credential for the entire Twilio account. A leak compromises every Twilio integration. A Standard API Key is scoped to this application and revocable independently.

Credential hierarchy:

Twilio Account SID          (not secret on its own)
├── Auth Token              (master — Twilio console only, rotate quarterly)
└── API Key: avc-phone-agent-prod   (Standard scope)
    ├── TWILIO_API_KEY_SID:    SK...
    └── TWILIO_API_KEY_SECRET: (treat as a password)

Create the API Key:

  1. Twilio console → Account → API Keys → Create new Standard key
  2. Name it avc-phone-agent-prod
  3. Copy SID (SK...) and Secret — Secret is shown once only

Changes in server.py:

Remove:

TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")

Add:

TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")

In _twilio_signature_ok(), change the HMAC key:

# Before
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()

# After
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()

Update the guard condition:

# Before
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:

# After
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:

Update the warning log:

# Before
elif not TWILIO_AUTH_TOKEN:
    logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")

# After
elif not TWILIO_API_KEY_SECRET:
    logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")

In TwilioFrameSerializer instantiation:

# Before
serializer = TwilioFrameSerializer(
    stream_sid=stream_sid,
    call_sid=call_sid,
    account_sid=TWILIO_ACCOUNT_SID,
    auth_token=TWILIO_AUTH_TOKEN,
)

# After
serializer = TwilioFrameSerializer(
    stream_sid=stream_sid,
    call_sid=call_sid,
    account_sid=TWILIO_ACCOUNT_SID,
    auth_token=TWILIO_API_KEY_SECRET,
)

Key rotation procedure:

  1. Create new Standard API Key in Twilio console
  2. Update TWILIO_API_KEY_SID + TWILIO_API_KEY_SECRET in .env
  3. Restart the service — no rebuild needed
  4. Verify one test call succeeds
  5. Revoke old key in Twilio console

Rotate on: any suspected leak, any team member departure, quarterly as routine.


Change 3 — Update .env

Remove:

TWILIO_AUTH_TOKEN=

Add:

TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
DEEPGRAM_API_KEY=

Full .env reference:

# Twilio — Auth Token lives in Twilio console only, never on this server
TWILIO_ACCOUNT_SID=AC...
TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
TWILIO_PHONE_NUMBER=+1...

# STT: Deepgram (real-time, in-call only)
DEEPGRAM_API_KEY=
DEEPGRAM_MODEL=nova-2

# LLM: Ollama
OLLAMA_URL=http://127.0.0.1:11434/v1
OLLAMA_MODEL=activeblue-avc:latest
LLM_PROVIDER=ollama
LLM_TEMPERATURE=0.3
LLM_MAX_TOKENS=160

# Anthropic (optional LLM swap + monitoring + synthetic data)
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-sonnet-4-6

# TTS: Kokoro
KOKORO_VOICE=af_heart
KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models

# Odoo
ODOO_URL=https://avc.activeblue.net
ODOO_DB=avc
ODOO_USER=
ODOO_API_KEY=
ODOO_TARGET=crm
ODOO_STAGE_ID=
ODOO_TEAM_ID=
ODOO_USER_ID=

# Server
PUBLIC_HOST=avc-phone.activeblue.net
PORT=8200
BIND_HOST=127.0.0.1
MAX_CONCURRENT_CALLS=2
STREAM_TOKEN=

# Call behaviour
AGENT_NAME=AVA
ENABLE_TOOLS=
VAD_CONFIDENCE=0.5
VAD_MIN_VOLUME=0.3
VAD_START_SECS=0.2
VAD_STOP_SECS=0.5

# Monitoring (Phase 4)
MONITORING_ENABLED=true
MONITORING_SCHEDULE=0 2 * * *

# A/B model routing (Phase 5 only)
AB_SPLIT_PERCENT=0
AB_MODEL_B=

Model Configuration

Current production model: activeblue-avc:latest

Property Value Notes
Base llama3.1:8b-instruct-q4_K_M Llama 3.1 8B, Q4_K_M quantization
ID 366a6cc15bb7 Rebuilt clean 2026-06-23
Size 4.9GB Down from 8.7GB Q8_0
VRAM usage ~4.5GB Leaves 11.5GB headroom on RTX 5080
Context 4096 tokens Sufficient for any phone call
Temperature 0.3 Low — maximizes JSON schema compliance
Top-p 0.9 Standard
Adapter None 44-pair LoRA adapter discarded

Modelfile (rebuild reference)

FROM llama3.1:8b-instruct-q4_K_M

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER num_ctx 4096
PARAMETER temperature 0.3
PARAMETER top_p 0.9

TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
{{ .Content }}<|eot_id|>
{{- end }}<|start_header_id|>assistant<|end_header_id|>
"

Why Q4_K_M not Q8_0

Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality difference at 8B scale.

Why no adapter

44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format.

Ollama inventory (current)

activeblue-avc:latest          366a6cc15bb7    4.9GB    production
llama3.1:8b-instruct-q4_K_M    46e0c10c039e    4.9GB    base
nomic-embed-text:latest        0a109f422b47    274MB    embeddings

Phase 5 training note

Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF:

# Phase 5 only — do not run now
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct
# ~16GB on disk, separate from Ollama storage

Build Phases

Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.

Phase 1 — Reliable call loop

Goal: Every utterance gets a response. Zero silent failures. AVC hangs up — not the caller.

  • Apply Change 1: swap Whisper for Deepgram in bot.py
  • Apply Change 2: swap Auth Token for API Key Secret in server.py
  • Apply Change 3: update .env
  • Verify EndCallProcessor termination in Twilio call logs (AVC side, not caller)
  • Verify AudioHeartbeat diagnostic logging active
  • Verify MAX_CONCURRENT_CALLS capacity gating works

Gate — all five must pass:

  1. 10 consecutive test calls — zero silent non-responses
  2. Zero zombie pipeline instances after call ends (docker stats)
  3. Call termination from AVC side confirmed in Twilio call logs
  4. JSON parse failure rate visible in logs — measurable not invisible
  5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio

Phase 2 — Accuracy (RAG + validation)

  • Populate rag/data/*.jsonl with real AVC data (human task — see RAG section)
  • ChromaDB RAG retriever wired into pipeline
  • Response validator: JSON schema + factual cross-check + PHI leak scan
  • Keyword blocklist (uncertainty phrases → handoff)
  • Intent classifier routing
  • Turn counter: max 3 failed turns before forced handoff + termination

Gate: 20 manual test calls, zero hallucinations on AVC-specific facts

Phase 3 — Booking

  • Real-time calendar availability check (odoo/calendar.py)
  • Whisper large-v3 post-call transcription (recording/transcriber.py)
  • Recording + transcript attached to Odoo lead chatter
  • Staff review flow confirmed in Odoo

Gate: Staff receives, reviews, and confirms a lead end-to-end

Phase 4 — Monitoring

  • Transcript index (recordings/index.jsonl)
  • Claude monitoring job
  • Dashboard: toggle, alert queue, one-click apply, playback, quality tagging

Gate: First monitoring run produces actionable suggestions

Phase 5 — Fine-tuning

  • Pull HuggingFace base (see model section)
  • Synthetic data generation via Claude API in JSON output format
  • Real call exporter using staff quality tags
  • Axolotl QLoRA on RTX 5080
  • Model registry + versioning + A/B routing

Gate: New model outperforms baseline over 50+ calls


Repository Structure

avc-phone-agent/
├── CLAUDE.md                          ← this file
├── README.md
├── .env                               ← never committed
├── .env.example
├── .gitignore                         ← includes .env, recordings/, *.gguf
│
├── bot.py                             ← Pipecat pipeline (Phase 1 changes here)
├── server.py                          ← Twilio webhook server (Phase 1 changes here)
├── practice.py                        ← AVC facts + Odoo persistence
├── extract.py                         ← post-call appointment extraction
├── odoo_client.py                     ← Odoo XML-RPC client
│
├── rag/                               ← Phase 2
│   ├── store.py
│   ├── loader.py
│   ├── retriever.py
│   └── data/
│       ├── avc_locations.jsonl
│       ├── avc_providers.jsonl
│       ├── avc_services.jsonl
│       ├── avc_hours.jsonl
│       ├── avc_insurance.jsonl
│       └── avc_faqs.jsonl
│
├── recording/                         ← Phase 3
│   ├── transcriber.py                 ← Whisper large-v3 post-call only
│   └── storage.py
│
├── monitoring/                        ← Phase 4
│   ├── monitor.py
│   ├── analyzer.py
│   ├── diff_engine.py
│   ├── scheduler.py
│   └── dashboard/
│       ├── app.py
│       └── static/
│
├── training/                          ← Phase 5 stub
│   └── README.md
│
├── tests/
│   ├── test_bot.py
│   ├── test_server.py
│   ├── test_odoo_client.py
│   ├── test_extract.py
│   └── fixtures/
│       └── sample_transcripts.jsonl
│
├── scripts/
│   ├── deploy.sh
│   └── smoke_test.sh
│
├── avc-phone.service                  ← existing systemd unit
└── traefik-avc-phone.yml              ← existing Traefik config

Infrastructure

Component Host Address Notes
Pipecat pipeline miaai 10.10.1.221 Python async, systemd
Ollama LLM miaai http://127.0.0.1:11434/v1 activeblue-avc:latest
ChromaDB (Phase 2) miaai http://10.10.1.221:8001 Docker volume
Twilio webhook miaai https://avc-phone.activeblue.net Traefik + Let's Encrypt
Monitoring dashboard miaai https://avc-monitor.activeblue.net internal only
Odoo CRM https://avc.activeblue.net XML-RPC, db: avc
Recordings miaai /home/tocmo0nlord/avc-phone/recordings/ local only
Gitea https://git.activeblue.net user: tocmo0nlord

RAG Store (Phase 2)

Stack: ChromaDB + nomic-embed-text:latest (already in Ollama) Collection: avc_knowledge Retrieval: Top-3 chunks per query on caller's current turn only

JSONL record format

{
  "id": "hours-kendall-weekday",
  "text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.",
  "tags": ["hours", "kendall"],
  "last_updated": "2026-06-23"
}

Data files — populated before Phase 2, not before Phase 1

File Content
avc_locations.jsonl Address, phone, fax, parking per location
avc_providers.jsonl Name, title, specialty, locations, languages
avc_services.jsonl Exam types, procedures
avc_hours.jsonl Hours per location, holiday closures, after-hours
avc_insurance.jsonl Accepted plans per location
avc_faqs.jsonl Approved Q&A pairs

Note: practice.py already contains real AVC location and insurance data scraped from advancedvisioncareflorida.com. Use it as the seed for the JSONL files rather than starting from scratch.


Claude Monitoring (Phase 4)

What it analyzes

  • Facts stated by AVA contradicting RAG store
  • System prompt violations
  • Calls that should have been handoffs
  • High failed turn counts — model or prompt signal
  • RAG gaps (AVA said "I don't have that" — should it be added?)
  • Phrasing that caused caller confusion

Output schema

{
  "call_sid": "CA...",
  "severity": "high",
  "issue_type": "factual_error",
  "description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.",
  "suggested_action": "rag_update",
  "suggested_change": {
    "file": "rag/data/avc_hours.jsonl",
    "record_id": "hours-kendall-weekday",
    "field": "text",
    "old": "...open until 6pm...",
    "new": "...open until 5pm..."
  }
}

suggested_action: rag_update | prompt_change | blocklist_add | flag_for_review

Dashboard

FastAPI + HTML/JS at https://avc-monitor.activeblue.net (internal only).

Feature Description
Enable/disable toggle Pauses scheduler without redeployment
Alert queue Suggestions sorted by severity
One-click apply Applies change, commits via Gitea API
Call playback Audio + transcript side-by-side
Quality tagging Staff tags calls from dashboard
Manual trigger POST /monitor/run

Fine-Tuning Pipeline (Phase 5 — stub)

Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks. See training/README.md — populated at Phase 5 start.

  • Synthetic data: Claude API generates Q&A in JSON output format — schema not style
  • Real calls: staff-tagged "good" + corrected bad calls
  • Target: 500+ pairs per intent before first Axolotl run
  • QLoRA via Axolotl on RTX 5080, base: HuggingFace meta-llama/Llama-3.1-8B-Instruct
  • Versioned Ollama models: activeblue-avc:vN
  • A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls

HIPAA and Compliance

  • AVA identifies as automated at call start — no exceptions
  • No PHI in ChromaDB — practice information only
  • Recordings on miaai only — no cloud storage
  • Odoo API user: minimum permissions, not admin
  • All endpoints HTTPS via Traefik
  • .env never committed

Deploy Script (scripts/deploy.sh)

#!/bin/bash
set -e
cd /home/tocmo0nlord/avc-phone
git pull origin main
pip install -r requirements.txt --quiet
systemctl restart avc-phone
systemctl status avc-phone --no-pager
echo "[deploy] Done."

Development Conventions

  • Python 3.13 (matches miaai miniconda environment)
  • Async throughout — Pipecat is async-native
  • loguru for all logging — already in use, keep consistent
  • Structured log lines for all diagnostic events
  • python-dotenv for local dev, env injection in prod
  • Secrets never hardcoded
  • Every module has if __name__ == "__main__": for isolated testing

Key Dependencies (current)

pipecat-ai==1.3.0           # installed at /opt/miniconda3
pipecat-ai[deepgram]        # add for Phase 1 Deepgram swap
deepgram-sdk                # add for Phase 1
kokoro-tts                  # already installed
ollama                      # already installed
scipy / numpy               # already installed (pipecat deps)
chromadb                    # add for Phase 2
sentence-transformers        # add for Phase 2
anthropic                   # for monitoring + optional LLM swap
openai-whisper              # retained for post-call transcription only
fastapi / uvicorn           # already installed
loguru                      # already installed
httpx                       # already installed

Open Items

  • Create avc-phone-agent-prod Standard API Key in Twilio console
  • Add TWILIO_API_KEY_SID + TWILIO_API_KEY_SECRET + DEEPGRAM_API_KEY to .env
  • Confirm ODOO_STAGE_ID, ODOO_TEAM_ID, ODOO_USER_ID from live avc db
  • Confirm AVA voice — af_heart is current default, confirm with AVC before go-live
  • Populate rag/data/*.jsonl before Phase 2 (seed from practice.py data)
  • Define Odoo confirmed appointment flow: lead → opportunity → calendar event
  • Staff training on monitoring dashboard quality tagging

Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-agent