Go to file

tocmo0nlord 93620be9bb Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token

Reframe Change 1/2/3 to record the actual decisions instead of the trialed
swaps: Deepgram and the Twilio Standard API Key were both evaluated and
reverted. Document why the API Key cannot replace the Auth Token (Twilio signs
webhooks with the Auth Token). Update the .env reference, Phase 1 checklist,
dependencies, and open items accordingly; gate zombie-check uses ps/pgrep
(bare process, not Docker).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-25 01:09:50 +00:00

deploy

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

.env.example

Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token

2026-06-25 01:06:24 +00:00

.gitignore

Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token

2026-06-25 01:06:24 +00:00

avc-phone.service

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

bot_web.py

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

bot.py

Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token

2026-06-25 01:06:24 +00:00

CLAUDE.md

Update CLAUDE.md: Phase 1 keeps Whisper STT + Twilio Auth Token

2026-06-25 01:09:50 +00:00

extract.py

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

odoo_client.py

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

practice.py

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

README.md

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

run_web.sh

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

run.sh

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

server.py

Revert Phase 1 STT/auth swaps: stay on Whisper + Twilio Auth Token

2026-06-25 01:06:24 +00:00

traefik-avc-phone.yml

Initial commit: avc-phone-ai codebase + CLAUDE.md

2026-06-23 22:38:22 +00:00

README.md

AVC Phone Agent — inbound optometry line (Pipecat + Twilio, fully local)

A real phone number that callers dial; the agent answers in voice, handles hours / location / insurance / services questions, and captures appointment requests for staff callback. All AI runs locally on this box:

caller ─▶ Twilio ─▶ wss (Traefik TLS) ─▶ server.py ─▶ Pipecat pipeline:
          Twilio Media Stream (8kHz µ-law)
              │
              ▼
   Silero VAD ─▶ Whisper STT (GPU) ─▶ activeblue-avc (Ollama) ─▶ Kokoro TTS ─▶ back to caller

Inbound only. No cloud STT/TTS — audio stays on the machine except the Twilio carrier leg.

Files

File	Role
`server.py`	FastAPI: `POST /voice` (TwiML) + `WS /ws` (Twilio Media Stream)
`bot.py`	The per-call Pipecat pipeline (VAD→STT→LLM→TTS) + tool wiring
`practice.py`	AVC business facts (PLACEHOLDERS — edit before go-live) + appointment-capture tool
`odoo_client.py`	Writes captured requests into Odoo (CRM lead by default) via XML-RPC
`run.sh`	Launcher (reuses pipecat-run venv + sets CUDA lib path)
`avc-phone.service`	systemd unit (install on this box)
`deploy/setup-tls.sh`	One-shot: Let's Encrypt cert + nginx vhost install (run as root)
`deploy/nginx-*.conf`	nginx TLS reverse-proxy vhost + WebSocket-upgrade map
`traefik-avc-phone.yml`	Unused alternative (kept for a future multi-host/Traefik setup)
`.env.example`	Copy to `.env`, fill Twilio creds + public host + Odoo creds
`appointment_requests.jsonl`	Local fallback — only used if Odoo is unreachable/disabled

What's done vs. what YOU must supply

Working / verified locally:

Pipeline assembles; all services construct (smoke-tested).
GPU Whisper fixed — installed CUDA12 cublas+cudnn wheels into the venv; run.sh sets LD_LIBRARY_PATH so faster-whisper finds them. Verified transcribe on GPU.
Local model activeblue-avc:latest is the brain; Kokoro voice; appointment tool.
Odoo appointment integration wired + verified against prod db1: a captured request creates a crm.lead (callback to-do) via XML-RPC using the same API key the activeblue-agent service uses. Verified create→read→delete (no residue left in db1). If Odoo is unreachable or creds are blank, it falls back to appointment_requests.jsonl and still confirms to the caller — a request is never lost.

You must supply (can't be done from this box):

Twilio account + a Voice phone number.
Port-forward 443 (and 80) from your router to this box, and run deploy/setup-tls.sh for the nginx TLS reverse proxy (Twilio needs real TLS on 443 for the wss stream).
Real AVC facts in practice.py (hours, address, insurance, services, phone).
Odoo creds in .env (ODOO_USER + ODOO_API_KEY) to enable lead creation. Set ODOO_DB (db1 for prod) and ODOO_TARGET (crm lead, or calendar event). Leave creds blank to disable Odoo and log to JSONL only.

Setup

Config

cd /home/tocmo0nlord/avc-phone
cp .env.example .env        # fill PUBLIC_HOST, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
$EDITOR practice.py         # replace PLACEHOLDER hours/address/insurance/services

Run it

./run.sh                    # listens plain HTTP on :8200 (Traefik terminates TLS)
curl localhost:8200/health  # {"status":"ok",...}

TLS reverse proxy (nginx, on this box). No Traefik — voip.activeblue.net points at your WAN IP (66.23.239.222) which NATs to this box (10.10.1.221). nginx is already installed and only serving the default site, so we add a vhost for the domain. Twilio's wss media stream needs real TLS on 443, so:
- Forward 443 (and 80) on your router → 10.10.1.221. (80 is for the Let's Encrypt challenge + the http→https redirect; 443 is the actual traffic.)
- Run the one-shot setup (gets a Let's Encrypt cert, installs the vhost + ws map, reloads nginx):
```
sudo bash deploy/setup-tls.sh
```
  It uses deploy/nginx-voip.activeblue.net.conf (proxies 443 → 127.0.0.1:8200, forwards the /ws upgrade, 1-hour stream timeout) and deploy/nginx-ws-upgrade.conf.
- Verify publicly: curl https://voip.activeblue.net/health.
Twilio number config (console.twilio.com → your number → Voice):
- A call comes in → Webhook → https://voip.activeblue.net/voice → HTTP POST.
- Save. That's it — the TwiML we return tells Twilio to open the Media Stream to wss://voip.activeblue.net/ws.
Call the number. You should hear the greeting and be able to talk to it.

Security (built in)

Webhook signature validation: POST /voice verifies Twilio's X-Twilio-Signature (HMAC-SHA1 over the public URL + sorted POST params, keyed by TWILIO_AUTH_TOKEN). Enforced automatically whenever TWILIO_AUTH_TOKEN is set. Verified against Twilio's published reference vector. Unsigned/forged requests get 403. Set TWILIO_VALIDATE=false only for local testing.
- The signed URL must match exactly, so PUBLIC_HOST must equal the host on the number's webhook (https://$PUBLIC_HOST/voice). If Traefik rewrites host/path, signatures fail.
Media-stream gate: /ws can't carry a usable Twilio signature, so it's gated by a shared STREAM_TOKEN embedded in the wss URL we hand Twilio. Bad/missing token → socket closed. Set a stable STREAM_TOKEN in .env (openssl rand -base64 24).

Run it as a service (systemd)

A unit is provided: avc-phone.service (runs as your user, Restart=always, ordered after ollama.service). Install (needs sudo — paste these in a ! shell or a terminal):

sudo cp /home/tocmo0nlord/avc-phone/avc-phone.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now avc-phone.service
systemctl status avc-phone.service          # check it's running
journalctl -u avc-phone.service -f          # follow logs

Restart after editing .env or practice.py: sudo systemctl restart avc-phone.service. (No-sudo alternative: a systemctl --user unit + loginctl enable-linger tocmo0nlord — ask and I'll convert it.)

Concurrency cap (built in)

MAX_CONCURRENT_CALLS (default 2) bounds simultaneous live calls. The count tracks active /ws pipelines (the real GPU consumers); when full, /voice speaks BUSY_MESSAGE and hangs up before any GPU work, so in-progress calls are never degraded. A hard reservation at /ws covers the rare race. /health reports active_calls/max_calls for monitoring. Tune the cap to your GPU headroom.

Known limits / next steps

Per-call Whisper load: each call currently constructs its own Whisper model on the GPU. Fine within the cap; a future optimization is sharing one warm Whisper instance across calls to cut memory + first-utterance latency.
Latency: first call after start pays one-time model loads (Whisper/Kokoro/Ollama). Keep the process warm. Tune WHISPER_MODEL=tiny if you need faster STT.
Function-calling reliability: activeblue-avc is an 8B fine-tune; tool-calling may need prompt tuning. If it's flaky, we can fall back to a deterministic slot-filling flow for appointment capture.