diff --git a/CLAUDE.md b/CLAUDE.md index 98a5181..dd85706 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,7 +1,7 @@ # AVC Phone Agent — Project Specification > Claude Code authoritative reference. All architecture, security, and build decisions live here. > Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai` -> Last updated: 2026-06-23 | Active Blue LLC +> Last updated: 2026-06-25 | Active Blue LLC --- @@ -26,8 +26,8 @@ The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation. | File | Status | Action | |------|--------|--------| -| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 | -| `server.py` | Keep with one change | Swap Auth Token for API Key Secret | +| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 | +| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 | | `practice.py` | Keep as-is | No changes | | `extract.py` | Keep as-is | No changes | | `odoo_client.py` | Keep as-is | Already uses API key auth correctly | @@ -62,172 +62,115 @@ not password. Correct pattern. No changes. --- -## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`) +## Change 1 — Real-time STT stays on Whisper (`bot.py`) -**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering -before the LLM sees any input. This is the primary cause of non-reply and the -repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers -end-of-utterance events in under 300ms. +**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.** -**Remove from `bot.py`:** +Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM +sees input). The swap was applied and then reverted — the project stays on local +faster-whisper. No external STT dependency, no per-minute STT cost, and no audio +leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the +`medium` model on the RTX 5080. + +**Current `bot.py` STT (in place — do not change):** ```python -# Remove this import from pipecat.services.whisper.stt import WhisperSTTService -# Remove these env vars -WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base") -WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") +WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium +WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080 WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16") -WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") +WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias -# Remove the entire HintedWhisperSTTService class -``` - -**Add to `bot.py`:** -```python -# Add import -from pipecat.services.deepgram.stt import DeepgramSTTService - -# Add env var -DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "") - -# Replace stt instantiation in run_agent() -stt = DeepgramSTTService( - api_key=DEEPGRAM_API_KEY, - settings=DeepgramSTTService.Settings( - model="nova-2", - language="en-US", - smart_format=True, - punctuate=True, - interim_results=False, # final transcripts only — avoids double-firing - utterance_end_ms=1000, # ms of silence before end-of-utterance fires - ) +# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords` +# (office cities + optometry terms) per call. Instantiated in run_agent(): +stt = HintedWhisperSTTService( + settings=WhisperSTTService.Settings(model=WHISPER_MODEL), + device=WHISPER_DEVICE, + compute_type=WHISPER_COMPUTE, + hotwords=WHISPER_HOTWORDS, ) ``` -**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained -for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does -not matter and accuracy is more important than speed. +**Note:** Whisper large-v3 also serves post-call transcription in Phase 3 +(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1 +gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively. --- -## Change 2 — Swap Auth Token for API Key Secret (`server.py`) +## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`) -**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account. -A leak compromises every Twilio integration. A Standard API Key is scoped to this -application and revocable independently. +**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.** -**Credential hierarchy:** +A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token, +but it **cannot do what this server needs**: Twilio signs inbound webhooks +(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate +that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice` +(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token +credential pair. The swap was reverted. + +**Credential model (in place):** ``` Twilio Account SID (not secret on its own) -├── Auth Token (master — Twilio console only, rotate quarterly) -└── API Key: avc-phone-agent-prod (Standard scope) - ├── TWILIO_API_KEY_SID: SK... - └── TWILIO_API_KEY_SECRET: (treat as a password) +└── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up) ``` -**Create the API Key:** -1. Twilio console → Account → API Keys → Create new Standard key -2. Name it `avc-phone-agent-prod` -3. Copy SID (`SK...`) and Secret — Secret is shown once only +Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on +any suspected leak / team departure / quarterly. If finer-grained scoping is ever +required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature` +validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap. -**Changes in `server.py`:** +**Current `server.py` (in place — do not change):** -Remove: ```python +TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID") TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN") -``` -Add: -```python -TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID") -TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET") -``` - -In `_twilio_signature_ok()`, change the HMAC key: -```python -# Before +# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with) digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest() -# After -digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest() -``` - -Update the guard condition: -```python -# Before +# Validation gate + warning if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN: - -# After -if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET: -``` - -Update the warning log: -```python -# Before + ... elif not TWILIO_AUTH_TOKEN: logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)") -# After -elif not TWILIO_API_KEY_SECRET: - logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)") -``` - -In `TwilioFrameSerializer` instantiation: -```python -# Before +# Serializer auto-hang-up uses the account SID + Auth Token pair serializer = TwilioFrameSerializer( stream_sid=stream_sid, call_sid=call_sid, account_sid=TWILIO_ACCOUNT_SID, auth_token=TWILIO_AUTH_TOKEN, ) - -# After -serializer = TwilioFrameSerializer( - stream_sid=stream_sid, - call_sid=call_sid, - account_sid=TWILIO_ACCOUNT_SID, - auth_token=TWILIO_API_KEY_SECRET, -) ``` -**Key rotation procedure:** -1. Create new Standard API Key in Twilio console -2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env` +**Auth Token rotation procedure:** +1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow) +2. Update `TWILIO_AUTH_TOKEN` in `.env` 3. Restart the service — no rebuild needed -4. Verify one test call succeeds -5. Revoke old key in Twilio console +4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it) +5. Retire the old token in the Twilio console Rotate on: any suspected leak, any team member departure, quarterly as routine. --- -## Change 3 — Update `.env` +## Change 3 — `.env` -**Remove:** -```env -TWILIO_AUTH_TOKEN= -``` - -**Add:** -```env -TWILIO_API_KEY_SID=SK... -TWILIO_API_KEY_SECRET= -DEEPGRAM_API_KEY= -``` +No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no** +`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2). **Full `.env` reference:** ```env -# Twilio — Auth Token lives in Twilio console only, never on this server +# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed. TWILIO_ACCOUNT_SID=AC... -TWILIO_API_KEY_SID=SK... -TWILIO_API_KEY_SECRET= +TWILIO_AUTH_TOKEN= TWILIO_PHONE_NUMBER=+1... +TWILIO_VALIDATE=true -# STT: Deepgram (real-time, in-call only) -DEEPGRAM_API_KEY= -DEEPGRAM_MODEL=nova-2 +# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3) +WHISPER_MODEL=medium +WHISPER_DEVICE=cuda +WHISPER_COMPUTE=float16 # LLM: Ollama OLLAMA_URL=http://127.0.0.1:11434/v1 @@ -352,16 +295,17 @@ Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete. **Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not the caller. -- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py` -- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py` -- [ ] Apply Change 3: update `.env` +- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`) +- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token +- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest` - [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller) - [ ] Verify `AudioHeartbeat` diagnostic logging active - [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works **Gate — all five must pass:** 1. 10 consecutive test calls — zero silent non-responses -2. Zero zombie pipeline instances after call ends (`docker stats`) +2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare + systemd/host process, not Docker) 3. Call termination from AVC side confirmed in Twilio call logs 4. JSON parse failure rate visible in logs — measurable not invisible 5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio @@ -620,15 +564,14 @@ echo "[deploy] Done." ``` pipecat-ai==1.3.0 # installed at /opt/miniconda3 -pipecat-ai[deepgram] # add for Phase 1 Deepgram swap -deepgram-sdk # add for Phase 1 +faster-whisper # real-time STT (already installed in pipecat-run venv) kokoro-tts # already installed ollama # already installed scipy / numpy # already installed (pipecat deps) chromadb # add for Phase 2 sentence-transformers # add for Phase 2 anthropic # for monitoring + optional LLM swap -openai-whisper # retained for post-call transcription only +openai-whisper # large-v3 for post-call transcription (Phase 3) fastapi / uvicorn # already installed loguru # already installed httpx # already installed @@ -638,8 +581,7 @@ httpx # already installed ## Open Items -- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console -- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env` +- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale) - [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db - [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live - [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)