# AVC Phone Agent — inbound optometry line (Pipecat + Twilio, fully local) A real phone number that callers dial; the agent answers in voice, handles hours / location / insurance / services questions, and **captures appointment requests** for staff callback. All AI runs **locally on this box**: ``` caller ─▶ Twilio ─▶ wss (Traefik TLS) ─▶ server.py ─▶ Pipecat pipeline: Twilio Media Stream (8kHz µ-law) │ ▼ Silero VAD ─▶ Whisper STT (GPU) ─▶ activeblue-avc (Ollama) ─▶ Kokoro TTS ─▶ back to caller ``` Inbound only. No cloud STT/TTS — audio stays on the machine except the Twilio carrier leg. ## Files | File | Role | |---|---| | `server.py` | FastAPI: `POST /voice` (TwiML) + `WS /ws` (Twilio Media Stream) | | `bot.py` | The per-call Pipecat pipeline (VAD→STT→LLM→TTS) + tool wiring | | `practice.py` | **AVC business facts (PLACEHOLDERS — edit before go-live)** + appointment-capture tool | | `odoo_client.py` | Writes captured requests into Odoo (CRM lead by default) via XML-RPC | | `run.sh` | Launcher (reuses pipecat-run venv + sets CUDA lib path) | | `avc-phone.service` | systemd unit (install on this box) | | `deploy/setup-tls.sh` | One-shot: Let's Encrypt cert + nginx vhost install (run as root) | | `deploy/nginx-*.conf` | nginx TLS reverse-proxy vhost + WebSocket-upgrade map | | `traefik-avc-phone.yml` | Unused alternative (kept for a future multi-host/Traefik setup) | | `.env.example` | Copy to `.env`, fill Twilio creds + public host + Odoo creds | | `appointment_requests.jsonl` | Local fallback — only used if Odoo is unreachable/disabled | ## What's done vs. what YOU must supply **Working / verified locally:** - Pipeline assembles; all services construct (smoke-tested). - GPU Whisper fixed — installed CUDA12 `cublas`+`cudnn` wheels into the venv; `run.sh` sets `LD_LIBRARY_PATH` so faster-whisper finds them. Verified transcribe on GPU. - Local model `activeblue-avc:latest` is the brain; Kokoro voice; appointment tool. - **Odoo appointment integration wired + verified** against prod `db1`: a captured request creates a `crm.lead` (callback to-do) via XML-RPC using the same API key the `activeblue-agent` service uses. Verified create→read→delete (no residue left in db1). If Odoo is unreachable or creds are blank, it falls back to `appointment_requests.jsonl` and still confirms to the caller — a request is never lost. **You must supply (can't be done from this box):** 1. **Twilio account + a Voice phone number.** 2. **Port-forward 443** (and 80) from your router to this box, and run `deploy/setup-tls.sh` for the nginx TLS reverse proxy (Twilio needs real TLS on 443 for the `wss` stream). 3. **Real AVC facts** in `practice.py` (hours, address, insurance, services, phone). 4. **Odoo creds in `.env`** (`ODOO_USER` + `ODOO_API_KEY`) to enable lead creation. Set `ODOO_DB` (`db1` for prod) and `ODOO_TARGET` (`crm` lead, or `calendar` event). Leave creds blank to disable Odoo and log to JSONL only. ## Setup 1. **Config** ```bash cd /home/tocmo0nlord/avc-phone cp .env.example .env # fill PUBLIC_HOST, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN $EDITOR practice.py # replace PLACEHOLDER hours/address/insurance/services ``` 2. **Run it** ```bash ./run.sh # listens plain HTTP on :8200 (Traefik terminates TLS) curl localhost:8200/health # {"status":"ok",...} ``` 3. **TLS reverse proxy (nginx, on this box).** No Traefik — `voip.activeblue.net` points at your WAN IP (`66.23.239.222`) which NATs to this box (`10.10.1.221`). nginx is already installed and only serving the default site, so we add a vhost for the domain. **Twilio's `wss` media stream needs real TLS on 443**, so: - **Forward `443` (and `80`) on your router → `10.10.1.221`.** (80 is for the Let's Encrypt challenge + the http→https redirect; 443 is the actual traffic.) - Run the one-shot setup (gets a Let's Encrypt cert, installs the vhost + ws map, reloads nginx): ```bash sudo bash deploy/setup-tls.sh ``` It uses `deploy/nginx-voip.activeblue.net.conf` (proxies 443 → `127.0.0.1:8200`, forwards the `/ws` upgrade, 1-hour stream timeout) and `deploy/nginx-ws-upgrade.conf`. - Verify publicly: `curl https://voip.activeblue.net/health`. 4. **Twilio number config** (console.twilio.com → your number → Voice): - **A call comes in** → Webhook → `https://voip.activeblue.net/voice` → HTTP **POST**. - Save. That's it — the TwiML we return tells Twilio to open the Media Stream to `wss://voip.activeblue.net/ws`. 5. **Call the number.** You should hear the greeting and be able to talk to it. ## Security (built in) - **Webhook signature validation:** `POST /voice` verifies Twilio's `X-Twilio-Signature` (HMAC-SHA1 over the public URL + sorted POST params, keyed by `TWILIO_AUTH_TOKEN`). Enforced automatically whenever `TWILIO_AUTH_TOKEN` is set. Verified against Twilio's published reference vector. Unsigned/forged requests get `403`. Set `TWILIO_VALIDATE=false` only for local testing. - The signed URL must match exactly, so **`PUBLIC_HOST` must equal the host on the number's webhook** (`https://$PUBLIC_HOST/voice`). If Traefik rewrites host/path, signatures fail. - **Media-stream gate:** `/ws` can't carry a usable Twilio signature, so it's gated by a shared `STREAM_TOKEN` embedded in the wss URL we hand Twilio. Bad/missing token → socket closed. Set a stable `STREAM_TOKEN` in `.env` (`openssl rand -base64 24`). ## Run it as a service (systemd) A unit is provided: `avc-phone.service` (runs as your user, `Restart=always`, ordered after `ollama.service`). Install (needs sudo — paste these in a `!` shell or a terminal): ```bash sudo cp /home/tocmo0nlord/avc-phone/avc-phone.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable --now avc-phone.service systemctl status avc-phone.service # check it's running journalctl -u avc-phone.service -f # follow logs ``` Restart after editing `.env` or `practice.py`: `sudo systemctl restart avc-phone.service`. (No-sudo alternative: a `systemctl --user` unit + `loginctl enable-linger tocmo0nlord` — ask and I'll convert it.) ## Concurrency cap (built in) `MAX_CONCURRENT_CALLS` (default **2**) bounds simultaneous live calls. The count tracks active `/ws` pipelines (the real GPU consumers); when full, `/voice` speaks `BUSY_MESSAGE` and hangs up **before any GPU work**, so in-progress calls are never degraded. A hard reservation at `/ws` covers the rare race. `/health` reports `active_calls`/`max_calls` for monitoring. Tune the cap to your GPU headroom. ## Known limits / next steps - **Per-call Whisper load:** each call currently constructs its own Whisper model on the GPU. Fine within the cap; a future optimization is sharing one warm Whisper instance across calls to cut memory + first-utterance latency. - **Latency:** first call after start pays one-time model loads (Whisper/Kokoro/Ollama). Keep the process warm. Tune `WHISPER_MODEL=tiny` if you need faster STT. - **Function-calling reliability:** `activeblue-avc` is an 8B fine-tune; tool-calling may need prompt tuning. If it's flaky, we can fall back to a deterministic slot-filling flow for appointment capture.