Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token

Reframe Change 1/2/3 to record the actual decisions instead of the trialed
swaps: Deepgram and the Twilio Standard API Key were both evaluated and
reverted. Document why the API Key cannot replace the Auth Token (Twilio signs
webhooks with the Auth Token). Update the .env reference, Phase 1 checklist,
dependencies, and open items accordingly; gate zombie-check uses ps/pgrep
(bare process, not Docker).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-25 01:09:50 +00:00
parent 5ed641255c
commit 93620be9bb

196
CLAUDE.md
View File

@@ -1,7 +1,7 @@
# AVC Phone Agent — Project Specification
> Claude Code authoritative reference. All architecture, security, and build decisions live here.
> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
> Last updated: 2026-06-23 | Active Blue LLC
> Last updated: 2026-06-25 | Active Blue LLC
---
@@ -26,8 +26,8 @@ The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation.
| File | Status | Action |
|------|--------|--------|
| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 |
| `server.py` | Keep with one change | Swap Auth Token for API Key Secret |
| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 |
| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 |
| `practice.py` | Keep as-is | No changes |
| `extract.py` | Keep as-is | No changes |
| `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
@@ -62,172 +62,115 @@ not password. Correct pattern. No changes.
---
## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`)
## Change 1 — Real-time STT stays on Whisper (`bot.py`)
**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering
before the LLM sees any input. This is the primary cause of non-reply and the
repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers
end-of-utterance events in under 300ms.
**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.**
**Remove from `bot.py`:**
Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM
sees input). The swap was applied and then reverted — the project stays on local
faster-whisper. No external STT dependency, no per-minute STT cost, and no audio
leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the
`medium` model on the RTX 5080.
**Current `bot.py` STT (in place — do not change):**
```python
# Remove this import
from pipecat.services.whisper.stt import WhisperSTTService
# Remove these env vars
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias
# Remove the entire HintedWhisperSTTService class
```
**Add to `bot.py`:**
```python
# Add import
from pipecat.services.deepgram.stt import DeepgramSTTService
# Add env var
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
# Replace stt instantiation in run_agent()
stt = DeepgramSTTService(
api_key=DEEPGRAM_API_KEY,
settings=DeepgramSTTService.Settings(
model="nova-2",
language="en-US",
smart_format=True,
punctuate=True,
interim_results=False, # final transcripts only — avoids double-firing
utterance_end_ms=1000, # ms of silence before end-of-utterance fires
)
# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords`
# (office cities + optometry terms) per call. Instantiated in run_agent():
stt = HintedWhisperSTTService(
settings=WhisperSTTService.Settings(model=WHISPER_MODEL),
device=WHISPER_DEVICE,
compute_type=WHISPER_COMPUTE,
hotwords=WHISPER_HOTWORDS,
)
```
**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained
for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does
not matter and accuracy is more important than speed.
**Note:** Whisper large-v3 also serves post-call transcription in Phase 3
(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1
gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively.
---
## Change 2 — Swap Auth Token for API Key Secret (`server.py`)
## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`)
**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account.
A leak compromises every Twilio integration. A Standard API Key is scoped to this
application and revocable independently.
**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.**
**Credential hierarchy:**
A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token,
but it **cannot do what this server needs**: Twilio signs inbound webhooks
(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate
that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice`
(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token
credential pair. The swap was reverted.
**Credential model (in place):**
```
Twilio Account SID (not secret on its own)
── Auth Token (master — Twilio console only, rotate quarterly)
└── API Key: avc-phone-agent-prod (Standard scope)
├── TWILIO_API_KEY_SID: SK...
└── TWILIO_API_KEY_SECRET: (treat as a password)
── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up)
```
**Create the API Key:**
1. Twilio console → Account → API Keys → Create new Standard key
2. Name it `avc-phone-agent-prod`
3. Copy SID (`SK...`) and Secret — Secret is shown once only
Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on
any suspected leak / team departure / quarterly. If finer-grained scoping is ever
required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature`
validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap.
**Changes in `server.py`:**
**Current `server.py` (in place — do not change):**
Remove:
```python
TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
```
Add:
```python
TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
```
In `_twilio_signature_ok()`, change the HMAC key:
```python
# Before
# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with)
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
# After
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
```
Update the guard condition:
```python
# Before
# Validation gate + warning
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
# After
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
```
Update the warning log:
```python
# Before
...
elif not TWILIO_AUTH_TOKEN:
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
# After
elif not TWILIO_API_KEY_SECRET:
logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
```
In `TwilioFrameSerializer` instantiation:
```python
# Before
# Serializer auto-hang-up uses the account SID + Auth Token pair
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=TWILIO_ACCOUNT_SID,
auth_token=TWILIO_AUTH_TOKEN,
)
# After
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=TWILIO_ACCOUNT_SID,
auth_token=TWILIO_API_KEY_SECRET,
)
```
**Key rotation procedure:**
1. Create new Standard API Key in Twilio console
2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env`
**Auth Token rotation procedure:**
1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow)
2. Update `TWILIO_AUTH_TOKEN` in `.env`
3. Restart the service — no rebuild needed
4. Verify one test call succeeds
5. Revoke old key in Twilio console
4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it)
5. Retire the old token in the Twilio console
Rotate on: any suspected leak, any team member departure, quarterly as routine.
---
## Change 3 — Update `.env`
## Change 3 — `.env`
**Remove:**
```env
TWILIO_AUTH_TOKEN=
```
**Add:**
```env
TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
DEEPGRAM_API_KEY=
```
No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no**
`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2).
**Full `.env` reference:**
```env
# Twilio — Auth Token lives in Twilio console only, never on this server
# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed.
TWILIO_ACCOUNT_SID=AC...
TWILIO_API_KEY_SID=SK...
TWILIO_API_KEY_SECRET=
TWILIO_AUTH_TOKEN=
TWILIO_PHONE_NUMBER=+1...
TWILIO_VALIDATE=true
# STT: Deepgram (real-time, in-call only)
DEEPGRAM_API_KEY=
DEEPGRAM_MODEL=nova-2
# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3)
WHISPER_MODEL=medium
WHISPER_DEVICE=cuda
WHISPER_COMPUTE=float16
# LLM: Ollama
OLLAMA_URL=http://127.0.0.1:11434/v1
@@ -352,16 +295,17 @@ Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
the caller.
- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py`
- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py`
- [ ] Apply Change 3: update `.env`
- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`)
- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token
- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest`
- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
- [ ] Verify `AudioHeartbeat` diagnostic logging active
- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
**Gate — all five must pass:**
1. 10 consecutive test calls — zero silent non-responses
2. Zero zombie pipeline instances after call ends (`docker stats`)
2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare
systemd/host process, not Docker)
3. Call termination from AVC side confirmed in Twilio call logs
4. JSON parse failure rate visible in logs — measurable not invisible
5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
@@ -620,15 +564,14 @@ echo "[deploy] Done."
```
pipecat-ai==1.3.0 # installed at /opt/miniconda3
pipecat-ai[deepgram] # add for Phase 1 Deepgram swap
deepgram-sdk # add for Phase 1
faster-whisper # real-time STT (already installed in pipecat-run venv)
kokoro-tts # already installed
ollama # already installed
scipy / numpy # already installed (pipecat deps)
chromadb # add for Phase 2
sentence-transformers # add for Phase 2
anthropic # for monitoring + optional LLM swap
openai-whisper # retained for post-call transcription only
openai-whisper # large-v3 for post-call transcription (Phase 3)
fastapi / uvicorn # already installed
loguru # already installed
httpx # already installed
@@ -638,8 +581,7 @@ httpx # already installed
## Open Items
- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console
- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env`
- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale)
- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)