Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token
Reframe Change 1/2/3 to record the actual decisions instead of the trialed swaps: Deepgram and the Twilio Standard API Key were both evaluated and reverted. Document why the API Key cannot replace the Auth Token (Twilio signs webhooks with the Auth Token). Update the .env reference, Phase 1 checklist, dependencies, and open items accordingly; gate zombie-check uses ps/pgrep (bare process, not Docker). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
196
CLAUDE.md
196
CLAUDE.md
@@ -1,7 +1,7 @@
|
||||
# AVC Phone Agent — Project Specification
|
||||
> Claude Code authoritative reference. All architecture, security, and build decisions live here.
|
||||
> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
|
||||
> Last updated: 2026-06-23 | Active Blue LLC
|
||||
> Last updated: 2026-06-25 | Active Blue LLC
|
||||
|
||||
---
|
||||
|
||||
@@ -26,8 +26,8 @@ The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation.
|
||||
|
||||
| File | Status | Action |
|
||||
|------|--------|--------|
|
||||
| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 |
|
||||
| `server.py` | Keep with one change | Swap Auth Token for API Key Secret |
|
||||
| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 |
|
||||
| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 |
|
||||
| `practice.py` | Keep as-is | No changes |
|
||||
| `extract.py` | Keep as-is | No changes |
|
||||
| `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
|
||||
@@ -62,172 +62,115 @@ not password. Correct pattern. No changes.
|
||||
|
||||
---
|
||||
|
||||
## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`)
|
||||
## Change 1 — Real-time STT stays on Whisper (`bot.py`)
|
||||
|
||||
**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering
|
||||
before the LLM sees any input. This is the primary cause of non-reply and the
|
||||
repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers
|
||||
end-of-utterance events in under 300ms.
|
||||
**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.**
|
||||
|
||||
**Remove from `bot.py`:**
|
||||
Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM
|
||||
sees input). The swap was applied and then reverted — the project stays on local
|
||||
faster-whisper. No external STT dependency, no per-minute STT cost, and no audio
|
||||
leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the
|
||||
`medium` model on the RTX 5080.
|
||||
|
||||
**Current `bot.py` STT (in place — do not change):**
|
||||
```python
|
||||
# Remove this import
|
||||
from pipecat.services.whisper.stt import WhisperSTTService
|
||||
|
||||
# Remove these env vars
|
||||
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
|
||||
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
|
||||
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium
|
||||
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080
|
||||
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
|
||||
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")
|
||||
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias
|
||||
|
||||
# Remove the entire HintedWhisperSTTService class
|
||||
```
|
||||
|
||||
**Add to `bot.py`:**
|
||||
```python
|
||||
# Add import
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
|
||||
# Add env var
|
||||
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
|
||||
|
||||
# Replace stt instantiation in run_agent()
|
||||
stt = DeepgramSTTService(
|
||||
api_key=DEEPGRAM_API_KEY,
|
||||
settings=DeepgramSTTService.Settings(
|
||||
model="nova-2",
|
||||
language="en-US",
|
||||
smart_format=True,
|
||||
punctuate=True,
|
||||
interim_results=False, # final transcripts only — avoids double-firing
|
||||
utterance_end_ms=1000, # ms of silence before end-of-utterance fires
|
||||
)
|
||||
# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords`
|
||||
# (office cities + optometry terms) per call. Instantiated in run_agent():
|
||||
stt = HintedWhisperSTTService(
|
||||
settings=WhisperSTTService.Settings(model=WHISPER_MODEL),
|
||||
device=WHISPER_DEVICE,
|
||||
compute_type=WHISPER_COMPUTE,
|
||||
hotwords=WHISPER_HOTWORDS,
|
||||
)
|
||||
```
|
||||
|
||||
**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained
|
||||
for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does
|
||||
not matter and accuracy is more important than speed.
|
||||
**Note:** Whisper large-v3 also serves post-call transcription in Phase 3
|
||||
(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1
|
||||
gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively.
|
||||
|
||||
---
|
||||
|
||||
## Change 2 — Swap Auth Token for API Key Secret (`server.py`)
|
||||
## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`)
|
||||
|
||||
**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account.
|
||||
A leak compromises every Twilio integration. A Standard API Key is scoped to this
|
||||
application and revocable independently.
|
||||
**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.**
|
||||
|
||||
**Credential hierarchy:**
|
||||
A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token,
|
||||
but it **cannot do what this server needs**: Twilio signs inbound webhooks
|
||||
(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate
|
||||
that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice`
|
||||
(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token
|
||||
credential pair. The swap was reverted.
|
||||
|
||||
**Credential model (in place):**
|
||||
```
|
||||
Twilio Account SID (not secret on its own)
|
||||
├── Auth Token (master — Twilio console only, rotate quarterly)
|
||||
└── API Key: avc-phone-agent-prod (Standard scope)
|
||||
├── TWILIO_API_KEY_SID: SK...
|
||||
└── TWILIO_API_KEY_SECRET: (treat as a password)
|
||||
└── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up)
|
||||
```
|
||||
|
||||
**Create the API Key:**
|
||||
1. Twilio console → Account → API Keys → Create new Standard key
|
||||
2. Name it `avc-phone-agent-prod`
|
||||
3. Copy SID (`SK...`) and Secret — Secret is shown once only
|
||||
Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on
|
||||
any suspected leak / team departure / quarterly. If finer-grained scoping is ever
|
||||
required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature`
|
||||
validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap.
|
||||
|
||||
**Changes in `server.py`:**
|
||||
**Current `server.py` (in place — do not change):**
|
||||
|
||||
Remove:
|
||||
```python
|
||||
TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
|
||||
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
|
||||
```
|
||||
|
||||
Add:
|
||||
```python
|
||||
TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
|
||||
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
|
||||
```
|
||||
|
||||
In `_twilio_signature_ok()`, change the HMAC key:
|
||||
```python
|
||||
# Before
|
||||
# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with)
|
||||
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
||||
|
||||
# After
|
||||
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
||||
```
|
||||
|
||||
Update the guard condition:
|
||||
```python
|
||||
# Before
|
||||
# Validation gate + warning
|
||||
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
|
||||
|
||||
# After
|
||||
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
|
||||
```
|
||||
|
||||
Update the warning log:
|
||||
```python
|
||||
# Before
|
||||
...
|
||||
elif not TWILIO_AUTH_TOKEN:
|
||||
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
|
||||
|
||||
# After
|
||||
elif not TWILIO_API_KEY_SECRET:
|
||||
logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
|
||||
```
|
||||
|
||||
In `TwilioFrameSerializer` instantiation:
|
||||
```python
|
||||
# Before
|
||||
# Serializer auto-hang-up uses the account SID + Auth Token pair
|
||||
serializer = TwilioFrameSerializer(
|
||||
stream_sid=stream_sid,
|
||||
call_sid=call_sid,
|
||||
account_sid=TWILIO_ACCOUNT_SID,
|
||||
auth_token=TWILIO_AUTH_TOKEN,
|
||||
)
|
||||
|
||||
# After
|
||||
serializer = TwilioFrameSerializer(
|
||||
stream_sid=stream_sid,
|
||||
call_sid=call_sid,
|
||||
account_sid=TWILIO_ACCOUNT_SID,
|
||||
auth_token=TWILIO_API_KEY_SECRET,
|
||||
)
|
||||
```
|
||||
|
||||
**Key rotation procedure:**
|
||||
1. Create new Standard API Key in Twilio console
|
||||
2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env`
|
||||
**Auth Token rotation procedure:**
|
||||
1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow)
|
||||
2. Update `TWILIO_AUTH_TOKEN` in `.env`
|
||||
3. Restart the service — no rebuild needed
|
||||
4. Verify one test call succeeds
|
||||
5. Revoke old key in Twilio console
|
||||
4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it)
|
||||
5. Retire the old token in the Twilio console
|
||||
|
||||
Rotate on: any suspected leak, any team member departure, quarterly as routine.
|
||||
|
||||
---
|
||||
|
||||
## Change 3 — Update `.env`
|
||||
## Change 3 — `.env`
|
||||
|
||||
**Remove:**
|
||||
```env
|
||||
TWILIO_AUTH_TOKEN=
|
||||
```
|
||||
|
||||
**Add:**
|
||||
```env
|
||||
TWILIO_API_KEY_SID=SK...
|
||||
TWILIO_API_KEY_SECRET=
|
||||
DEEPGRAM_API_KEY=
|
||||
```
|
||||
No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no**
|
||||
`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2).
|
||||
|
||||
**Full `.env` reference:**
|
||||
```env
|
||||
# Twilio — Auth Token lives in Twilio console only, never on this server
|
||||
# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed.
|
||||
TWILIO_ACCOUNT_SID=AC...
|
||||
TWILIO_API_KEY_SID=SK...
|
||||
TWILIO_API_KEY_SECRET=
|
||||
TWILIO_AUTH_TOKEN=
|
||||
TWILIO_PHONE_NUMBER=+1...
|
||||
TWILIO_VALIDATE=true
|
||||
|
||||
# STT: Deepgram (real-time, in-call only)
|
||||
DEEPGRAM_API_KEY=
|
||||
DEEPGRAM_MODEL=nova-2
|
||||
# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3)
|
||||
WHISPER_MODEL=medium
|
||||
WHISPER_DEVICE=cuda
|
||||
WHISPER_COMPUTE=float16
|
||||
|
||||
# LLM: Ollama
|
||||
OLLAMA_URL=http://127.0.0.1:11434/v1
|
||||
@@ -352,16 +295,17 @@ Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
|
||||
**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
|
||||
the caller.
|
||||
|
||||
- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py`
|
||||
- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py`
|
||||
- [ ] Apply Change 3: update `.env`
|
||||
- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`)
|
||||
- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token
|
||||
- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest`
|
||||
- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
|
||||
- [ ] Verify `AudioHeartbeat` diagnostic logging active
|
||||
- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
|
||||
|
||||
**Gate — all five must pass:**
|
||||
1. 10 consecutive test calls — zero silent non-responses
|
||||
2. Zero zombie pipeline instances after call ends (`docker stats`)
|
||||
2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare
|
||||
systemd/host process, not Docker)
|
||||
3. Call termination from AVC side confirmed in Twilio call logs
|
||||
4. JSON parse failure rate visible in logs — measurable not invisible
|
||||
5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
|
||||
@@ -620,15 +564,14 @@ echo "[deploy] Done."
|
||||
|
||||
```
|
||||
pipecat-ai==1.3.0 # installed at /opt/miniconda3
|
||||
pipecat-ai[deepgram] # add for Phase 1 Deepgram swap
|
||||
deepgram-sdk # add for Phase 1
|
||||
faster-whisper # real-time STT (already installed in pipecat-run venv)
|
||||
kokoro-tts # already installed
|
||||
ollama # already installed
|
||||
scipy / numpy # already installed (pipecat deps)
|
||||
chromadb # add for Phase 2
|
||||
sentence-transformers # add for Phase 2
|
||||
anthropic # for monitoring + optional LLM swap
|
||||
openai-whisper # retained for post-call transcription only
|
||||
openai-whisper # large-v3 for post-call transcription (Phase 3)
|
||||
fastapi / uvicorn # already installed
|
||||
loguru # already installed
|
||||
httpx # already installed
|
||||
@@ -638,8 +581,7 @@ httpx # already installed
|
||||
|
||||
## Open Items
|
||||
|
||||
- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console
|
||||
- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env`
|
||||
- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale)
|
||||
- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
|
||||
- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
|
||||
- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)
|
||||
|
||||
Reference in New Issue
Block a user