Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token
Reframe Change 1/2/3 to record the actual decisions instead of the trialed swaps: Deepgram and the Twilio Standard API Key were both evaluated and reverted. Document why the API Key cannot replace the Auth Token (Twilio signs webhooks with the Auth Token). Update the .env reference, Phase 1 checklist, dependencies, and open items accordingly; gate zombie-check uses ps/pgrep (bare process, not Docker). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
196
CLAUDE.md
196
CLAUDE.md
@@ -1,7 +1,7 @@
|
|||||||
# AVC Phone Agent — Project Specification
|
# AVC Phone Agent — Project Specification
|
||||||
> Claude Code authoritative reference. All architecture, security, and build decisions live here.
|
> Claude Code authoritative reference. All architecture, security, and build decisions live here.
|
||||||
> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
|
> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
|
||||||
> Last updated: 2026-06-23 | Active Blue LLC
|
> Last updated: 2026-06-25 | Active Blue LLC
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -26,8 +26,8 @@ The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation.
|
|||||||
|
|
||||||
| File | Status | Action |
|
| File | Status | Action |
|
||||||
|------|--------|--------|
|
|------|--------|--------|
|
||||||
| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 |
|
| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 |
|
||||||
| `server.py` | Keep with one change | Swap Auth Token for API Key Secret |
|
| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 |
|
||||||
| `practice.py` | Keep as-is | No changes |
|
| `practice.py` | Keep as-is | No changes |
|
||||||
| `extract.py` | Keep as-is | No changes |
|
| `extract.py` | Keep as-is | No changes |
|
||||||
| `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
|
| `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
|
||||||
@@ -62,172 +62,115 @@ not password. Correct pattern. No changes.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`)
|
## Change 1 — Real-time STT stays on Whisper (`bot.py`)
|
||||||
|
|
||||||
**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering
|
**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.**
|
||||||
before the LLM sees any input. This is the primary cause of non-reply and the
|
|
||||||
repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers
|
|
||||||
end-of-utterance events in under 300ms.
|
|
||||||
|
|
||||||
**Remove from `bot.py`:**
|
Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM
|
||||||
|
sees input). The swap was applied and then reverted — the project stays on local
|
||||||
|
faster-whisper. No external STT dependency, no per-minute STT cost, and no audio
|
||||||
|
leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the
|
||||||
|
`medium` model on the RTX 5080.
|
||||||
|
|
||||||
|
**Current `bot.py` STT (in place — do not change):**
|
||||||
```python
|
```python
|
||||||
# Remove this import
|
|
||||||
from pipecat.services.whisper.stt import WhisperSTTService
|
from pipecat.services.whisper.stt import WhisperSTTService
|
||||||
|
|
||||||
# Remove these env vars
|
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium
|
||||||
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
|
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080
|
||||||
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
|
|
||||||
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
|
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
|
||||||
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")
|
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias
|
||||||
|
|
||||||
# Remove the entire HintedWhisperSTTService class
|
# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords`
|
||||||
```
|
# (office cities + optometry terms) per call. Instantiated in run_agent():
|
||||||
|
stt = HintedWhisperSTTService(
|
||||||
**Add to `bot.py`:**
|
settings=WhisperSTTService.Settings(model=WHISPER_MODEL),
|
||||||
```python
|
device=WHISPER_DEVICE,
|
||||||
# Add import
|
compute_type=WHISPER_COMPUTE,
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
hotwords=WHISPER_HOTWORDS,
|
||||||
|
|
||||||
# Add env var
|
|
||||||
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
|
|
||||||
|
|
||||||
# Replace stt instantiation in run_agent()
|
|
||||||
stt = DeepgramSTTService(
|
|
||||||
api_key=DEEPGRAM_API_KEY,
|
|
||||||
settings=DeepgramSTTService.Settings(
|
|
||||||
model="nova-2",
|
|
||||||
language="en-US",
|
|
||||||
smart_format=True,
|
|
||||||
punctuate=True,
|
|
||||||
interim_results=False, # final transcripts only — avoids double-firing
|
|
||||||
utterance_end_ms=1000, # ms of silence before end-of-utterance fires
|
|
||||||
)
|
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained
|
**Note:** Whisper large-v3 also serves post-call transcription in Phase 3
|
||||||
for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does
|
(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1
|
||||||
not matter and accuracy is more important than speed.
|
gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Change 2 — Swap Auth Token for API Key Secret (`server.py`)
|
## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`)
|
||||||
|
|
||||||
**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account.
|
**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.**
|
||||||
A leak compromises every Twilio integration. A Standard API Key is scoped to this
|
|
||||||
application and revocable independently.
|
|
||||||
|
|
||||||
**Credential hierarchy:**
|
A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token,
|
||||||
|
but it **cannot do what this server needs**: Twilio signs inbound webhooks
|
||||||
|
(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate
|
||||||
|
that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice`
|
||||||
|
(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token
|
||||||
|
credential pair. The swap was reverted.
|
||||||
|
|
||||||
|
**Credential model (in place):**
|
||||||
```
|
```
|
||||||
Twilio Account SID (not secret on its own)
|
Twilio Account SID (not secret on its own)
|
||||||
├── Auth Token (master — Twilio console only, rotate quarterly)
|
└── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up)
|
||||||
└── API Key: avc-phone-agent-prod (Standard scope)
|
|
||||||
├── TWILIO_API_KEY_SID: SK...
|
|
||||||
└── TWILIO_API_KEY_SECRET: (treat as a password)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Create the API Key:**
|
Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on
|
||||||
1. Twilio console → Account → API Keys → Create new Standard key
|
any suspected leak / team departure / quarterly. If finer-grained scoping is ever
|
||||||
2. Name it `avc-phone-agent-prod`
|
required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature`
|
||||||
3. Copy SID (`SK...`) and Secret — Secret is shown once only
|
validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap.
|
||||||
|
|
||||||
**Changes in `server.py`:**
|
**Current `server.py` (in place — do not change):**
|
||||||
|
|
||||||
Remove:
|
|
||||||
```python
|
```python
|
||||||
|
TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
|
||||||
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
|
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
|
||||||
```
|
|
||||||
|
|
||||||
Add:
|
# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with)
|
||||||
```python
|
|
||||||
TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
|
|
||||||
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
|
|
||||||
```
|
|
||||||
|
|
||||||
In `_twilio_signature_ok()`, change the HMAC key:
|
|
||||||
```python
|
|
||||||
# Before
|
|
||||||
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
||||||
|
|
||||||
# After
|
# Validation gate + warning
|
||||||
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
|
||||||
```
|
|
||||||
|
|
||||||
Update the guard condition:
|
|
||||||
```python
|
|
||||||
# Before
|
|
||||||
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
|
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
|
||||||
|
...
|
||||||
# After
|
|
||||||
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
|
|
||||||
```
|
|
||||||
|
|
||||||
Update the warning log:
|
|
||||||
```python
|
|
||||||
# Before
|
|
||||||
elif not TWILIO_AUTH_TOKEN:
|
elif not TWILIO_AUTH_TOKEN:
|
||||||
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
|
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
|
||||||
|
|
||||||
# After
|
# Serializer auto-hang-up uses the account SID + Auth Token pair
|
||||||
elif not TWILIO_API_KEY_SECRET:
|
|
||||||
logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
|
|
||||||
```
|
|
||||||
|
|
||||||
In `TwilioFrameSerializer` instantiation:
|
|
||||||
```python
|
|
||||||
# Before
|
|
||||||
serializer = TwilioFrameSerializer(
|
serializer = TwilioFrameSerializer(
|
||||||
stream_sid=stream_sid,
|
stream_sid=stream_sid,
|
||||||
call_sid=call_sid,
|
call_sid=call_sid,
|
||||||
account_sid=TWILIO_ACCOUNT_SID,
|
account_sid=TWILIO_ACCOUNT_SID,
|
||||||
auth_token=TWILIO_AUTH_TOKEN,
|
auth_token=TWILIO_AUTH_TOKEN,
|
||||||
)
|
)
|
||||||
|
|
||||||
# After
|
|
||||||
serializer = TwilioFrameSerializer(
|
|
||||||
stream_sid=stream_sid,
|
|
||||||
call_sid=call_sid,
|
|
||||||
account_sid=TWILIO_ACCOUNT_SID,
|
|
||||||
auth_token=TWILIO_API_KEY_SECRET,
|
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Key rotation procedure:**
|
**Auth Token rotation procedure:**
|
||||||
1. Create new Standard API Key in Twilio console
|
1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow)
|
||||||
2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env`
|
2. Update `TWILIO_AUTH_TOKEN` in `.env`
|
||||||
3. Restart the service — no rebuild needed
|
3. Restart the service — no rebuild needed
|
||||||
4. Verify one test call succeeds
|
4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it)
|
||||||
5. Revoke old key in Twilio console
|
5. Retire the old token in the Twilio console
|
||||||
|
|
||||||
Rotate on: any suspected leak, any team member departure, quarterly as routine.
|
Rotate on: any suspected leak, any team member departure, quarterly as routine.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Change 3 — Update `.env`
|
## Change 3 — `.env`
|
||||||
|
|
||||||
**Remove:**
|
No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no**
|
||||||
```env
|
`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2).
|
||||||
TWILIO_AUTH_TOKEN=
|
|
||||||
```
|
|
||||||
|
|
||||||
**Add:**
|
|
||||||
```env
|
|
||||||
TWILIO_API_KEY_SID=SK...
|
|
||||||
TWILIO_API_KEY_SECRET=
|
|
||||||
DEEPGRAM_API_KEY=
|
|
||||||
```
|
|
||||||
|
|
||||||
**Full `.env` reference:**
|
**Full `.env` reference:**
|
||||||
```env
|
```env
|
||||||
# Twilio — Auth Token lives in Twilio console only, never on this server
|
# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed.
|
||||||
TWILIO_ACCOUNT_SID=AC...
|
TWILIO_ACCOUNT_SID=AC...
|
||||||
TWILIO_API_KEY_SID=SK...
|
TWILIO_AUTH_TOKEN=
|
||||||
TWILIO_API_KEY_SECRET=
|
|
||||||
TWILIO_PHONE_NUMBER=+1...
|
TWILIO_PHONE_NUMBER=+1...
|
||||||
|
TWILIO_VALIDATE=true
|
||||||
|
|
||||||
# STT: Deepgram (real-time, in-call only)
|
# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3)
|
||||||
DEEPGRAM_API_KEY=
|
WHISPER_MODEL=medium
|
||||||
DEEPGRAM_MODEL=nova-2
|
WHISPER_DEVICE=cuda
|
||||||
|
WHISPER_COMPUTE=float16
|
||||||
|
|
||||||
# LLM: Ollama
|
# LLM: Ollama
|
||||||
OLLAMA_URL=http://127.0.0.1:11434/v1
|
OLLAMA_URL=http://127.0.0.1:11434/v1
|
||||||
@@ -352,16 +295,17 @@ Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
|
|||||||
**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
|
**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
|
||||||
the caller.
|
the caller.
|
||||||
|
|
||||||
- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py`
|
- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`)
|
||||||
- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py`
|
- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token
|
||||||
- [ ] Apply Change 3: update `.env`
|
- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest`
|
||||||
- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
|
- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
|
||||||
- [ ] Verify `AudioHeartbeat` diagnostic logging active
|
- [ ] Verify `AudioHeartbeat` diagnostic logging active
|
||||||
- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
|
- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
|
||||||
|
|
||||||
**Gate — all five must pass:**
|
**Gate — all five must pass:**
|
||||||
1. 10 consecutive test calls — zero silent non-responses
|
1. 10 consecutive test calls — zero silent non-responses
|
||||||
2. Zero zombie pipeline instances after call ends (`docker stats`)
|
2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare
|
||||||
|
systemd/host process, not Docker)
|
||||||
3. Call termination from AVC side confirmed in Twilio call logs
|
3. Call termination from AVC side confirmed in Twilio call logs
|
||||||
4. JSON parse failure rate visible in logs — measurable not invisible
|
4. JSON parse failure rate visible in logs — measurable not invisible
|
||||||
5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
|
5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
|
||||||
@@ -620,15 +564,14 @@ echo "[deploy] Done."
|
|||||||
|
|
||||||
```
|
```
|
||||||
pipecat-ai==1.3.0 # installed at /opt/miniconda3
|
pipecat-ai==1.3.0 # installed at /opt/miniconda3
|
||||||
pipecat-ai[deepgram] # add for Phase 1 Deepgram swap
|
faster-whisper # real-time STT (already installed in pipecat-run venv)
|
||||||
deepgram-sdk # add for Phase 1
|
|
||||||
kokoro-tts # already installed
|
kokoro-tts # already installed
|
||||||
ollama # already installed
|
ollama # already installed
|
||||||
scipy / numpy # already installed (pipecat deps)
|
scipy / numpy # already installed (pipecat deps)
|
||||||
chromadb # add for Phase 2
|
chromadb # add for Phase 2
|
||||||
sentence-transformers # add for Phase 2
|
sentence-transformers # add for Phase 2
|
||||||
anthropic # for monitoring + optional LLM swap
|
anthropic # for monitoring + optional LLM swap
|
||||||
openai-whisper # retained for post-call transcription only
|
openai-whisper # large-v3 for post-call transcription (Phase 3)
|
||||||
fastapi / uvicorn # already installed
|
fastapi / uvicorn # already installed
|
||||||
loguru # already installed
|
loguru # already installed
|
||||||
httpx # already installed
|
httpx # already installed
|
||||||
@@ -638,8 +581,7 @@ httpx # already installed
|
|||||||
|
|
||||||
## Open Items
|
## Open Items
|
||||||
|
|
||||||
- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console
|
- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale)
|
||||||
- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env`
|
|
||||||
- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
|
- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
|
||||||
- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
|
- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
|
||||||
- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)
|
- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)
|
||||||
|
|||||||
Reference in New Issue
Block a user