diff --git a/CLAUDE.md b/CLAUDE.md
index 98a5181..dd85706 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,7 +1,7 @@
 # AVC Phone Agent — Project Specification
 > Claude Code authoritative reference. All architecture, security, and build decisions live here.
 > Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
-> Last updated: 2026-06-23 | Active Blue LLC
+> Last updated: 2026-06-25 | Active Blue LLC
 
 ---
 
@@ -26,8 +26,8 @@ The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation.
 
 | File | Status | Action |
 |------|--------|--------|
-| `bot.py` | Keep with one change | Swap Whisper STT for Deepgram Nova-2 |
-| `server.py` | Keep with one change | Swap Auth Token for API Key Secret |
+| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 |
+| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 |
 | `practice.py` | Keep as-is | No changes |
 | `extract.py` | Keep as-is | No changes |
 | `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
@@ -62,172 +62,115 @@ not password. Correct pattern. No changes.
 
 ---
 
-## Change 1 — Swap Whisper STT for Deepgram Nova-2 (`bot.py`)
+## Change 1 — Real-time STT stays on Whisper (`bot.py`)
 
-**Why:** Whisper buffers audio chunks before transcribing — 1-3 seconds of buffering
-before the LLM sees any input. This is the primary cause of non-reply and the
-repeat-yourself problem. Deepgram Nova-2 via Pipecat's native transport delivers
-end-of-utterance events in under 300ms.
+**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.**
 
-**Remove from `bot.py`:**
+Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM
+sees input). The swap was applied and then reverted — the project stays on local
+faster-whisper. No external STT dependency, no per-minute STT cost, and no audio
+leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the
+`medium` model on the RTX 5080.
+
+**Current `bot.py` STT (in place — do not change):**
 ```python
-# Remove this import
 from pipecat.services.whisper.stt import WhisperSTTService
 
-# Remove these env vars
-WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "base")
-WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")
+WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium")   # tiny|base|small|medium
+WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda")   # cuda for the 5080
 WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
-WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")
+WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...")  # domain vocab bias
 
-# Remove the entire HintedWhisperSTTService class
-```
-
-**Add to `bot.py`:**
-```python
-# Add import
-from pipecat.services.deepgram.stt import DeepgramSTTService
-
-# Add env var
-DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
-
-# Replace stt instantiation in run_agent()
-stt = DeepgramSTTService(
-    api_key=DEEPGRAM_API_KEY,
-    settings=DeepgramSTTService.Settings(
-        model="nova-2",
-        language="en-US",
-        smart_format=True,
-        punctuate=True,
-        interim_results=False,      # final transcripts only — avoids double-firing
-        utterance_end_ms=1000,      # ms of silence before end-of-utterance fires
-    )
+# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords`
+# (office cities + optometry terms) per call. Instantiated in run_agent():
+stt = HintedWhisperSTTService(
+    settings=WhisperSTTService.Settings(model=WHISPER_MODEL),
+    device=WHISPER_DEVICE,
+    compute_type=WHISPER_COMPUTE,
+    hotwords=WHISPER_HOTWORDS,
 )
 ```
 
-**Note on Whisper:** Remove from real-time pipeline only. Whisper large-v3 is retained
-for post-call transcription in Phase 3 (`recording/transcriber.py`) where latency does
-not matter and accuracy is more important than speed.
+**Note:** Whisper large-v3 also serves post-call transcription in Phase 3
+(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1
+gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively.
 
 ---
 
-## Change 2 — Swap Auth Token for API Key Secret (`server.py`)
+## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`)
 
-**Why:** `TWILIO_AUTH_TOKEN` is the master credential for the entire Twilio account.
-A leak compromises every Twilio integration. A Standard API Key is scoped to this
-application and revocable independently.
+**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.**
 
-**Credential hierarchy:**
+A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token,
+but it **cannot do what this server needs**: Twilio signs inbound webhooks
+(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate
+that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice`
+(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token
+credential pair. The swap was reverted.
+
+**Credential model (in place):**
 ```
 Twilio Account SID          (not secret on its own)
-├── Auth Token              (master — Twilio console only, rotate quarterly)
-└── API Key: avc-phone-agent-prod   (Standard scope)
-    ├── TWILIO_API_KEY_SID:    SK...
-    └── TWILIO_API_KEY_SECRET: (treat as a password)
+└── Auth Token              (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up)
 ```
 
-**Create the API Key:**
-1. Twilio console → Account → API Keys → Create new Standard key
-2. Name it `avc-phone-agent-prod`
-3. Copy SID (`SK...`) and Secret — Secret is shown once only
+Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on
+any suspected leak / team departure / quarterly. If finer-grained scoping is ever
+required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature`
+validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap.
 
-**Changes in `server.py`:**
+**Current `server.py` (in place — do not change):**
 
-Remove:
 ```python
+TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
 TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
-```
 
-Add:
-```python
-TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
-TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
-```
-
-In `_twilio_signature_ok()`, change the HMAC key:
-```python
-# Before
+# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with)
 digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
 
-# After
-digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
-```
-
-Update the guard condition:
-```python
-# Before
+# Validation gate + warning
 if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
-
-# After
-if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
-```
-
-Update the warning log:
-```python
-# Before
+    ...
 elif not TWILIO_AUTH_TOKEN:
     logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
 
-# After
-elif not TWILIO_API_KEY_SECRET:
-    logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
-```
-
-In `TwilioFrameSerializer` instantiation:
-```python
-# Before
+# Serializer auto-hang-up uses the account SID + Auth Token pair
 serializer = TwilioFrameSerializer(
     stream_sid=stream_sid,
     call_sid=call_sid,
     account_sid=TWILIO_ACCOUNT_SID,
     auth_token=TWILIO_AUTH_TOKEN,
 )
-
-# After
-serializer = TwilioFrameSerializer(
-    stream_sid=stream_sid,
-    call_sid=call_sid,
-    account_sid=TWILIO_ACCOUNT_SID,
-    auth_token=TWILIO_API_KEY_SECRET,
-)
 ```
 
-**Key rotation procedure:**
-1. Create new Standard API Key in Twilio console
-2. Update `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` in `.env`
+**Auth Token rotation procedure:**
+1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow)
+2. Update `TWILIO_AUTH_TOKEN` in `.env`
 3. Restart the service — no rebuild needed
-4. Verify one test call succeeds
-5. Revoke old key in Twilio console
+4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it)
+5. Retire the old token in the Twilio console
 
 Rotate on: any suspected leak, any team member departure, quarterly as routine.
 
 ---
 
-## Change 3 — Update `.env`
+## Change 3 — `.env`
 
-**Remove:**
-```env
-TWILIO_AUTH_TOKEN=
-```
-
-**Add:**
-```env
-TWILIO_API_KEY_SID=SK...
-TWILIO_API_KEY_SECRET=
-DEEPGRAM_API_KEY=
-```
+No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no**
+`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2).
 
 **Full `.env` reference:**
 ```env
-# Twilio — Auth Token lives in Twilio console only, never on this server
+# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed.
 TWILIO_ACCOUNT_SID=AC...
-TWILIO_API_KEY_SID=SK...
-TWILIO_API_KEY_SECRET=
+TWILIO_AUTH_TOKEN=
 TWILIO_PHONE_NUMBER=+1...
+TWILIO_VALIDATE=true
 
-# STT: Deepgram (real-time, in-call only)
-DEEPGRAM_API_KEY=
-DEEPGRAM_MODEL=nova-2
+# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3)
+WHISPER_MODEL=medium
+WHISPER_DEVICE=cuda
+WHISPER_COMPUTE=float16
 
 # LLM: Ollama
 OLLAMA_URL=http://127.0.0.1:11434/v1
@@ -352,16 +295,17 @@ Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
 **Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
 the caller.
 
-- [ ] Apply Change 1: swap Whisper for Deepgram in `bot.py`
-- [ ] Apply Change 2: swap Auth Token for API Key Secret in `server.py`
-- [ ] Apply Change 3: update `.env`
+- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`)
+- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token
+- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest`
 - [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
 - [ ] Verify `AudioHeartbeat` diagnostic logging active
 - [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
 
 **Gate — all five must pass:**
 1. 10 consecutive test calls — zero silent non-responses
-2. Zero zombie pipeline instances after call ends (`docker stats`)
+2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare
+   systemd/host process, not Docker)
 3. Call termination from AVC side confirmed in Twilio call logs
 4. JSON parse failure rate visible in logs — measurable not invisible
 5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
@@ -620,15 +564,14 @@ echo "[deploy] Done."
 
 ```
 pipecat-ai==1.3.0           # installed at /opt/miniconda3
-pipecat-ai[deepgram]        # add for Phase 1 Deepgram swap
-deepgram-sdk                # add for Phase 1
+faster-whisper              # real-time STT (already installed in pipecat-run venv)
 kokoro-tts                  # already installed
 ollama                      # already installed
 scipy / numpy               # already installed (pipecat deps)
 chromadb                    # add for Phase 2
 sentence-transformers        # add for Phase 2
 anthropic                   # for monitoring + optional LLM swap
-openai-whisper              # retained for post-call transcription only
+openai-whisper              # large-v3 for post-call transcription (Phase 3)
 fastapi / uvicorn           # already installed
 loguru                      # already installed
 httpx                       # already installed
@@ -638,8 +581,7 @@ httpx                       # already installed
 
 ## Open Items
 
-- [ ] Create `avc-phone-agent-prod` Standard API Key in Twilio console
-- [ ] Add `TWILIO_API_KEY_SID` + `TWILIO_API_KEY_SECRET` + `DEEPGRAM_API_KEY` to `.env`
+- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale)
 - [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
 - [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
 - [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)