# AVC Phone Agent — inbound optometry line (Pipecat + Twilio, fully local)

A real phone number that callers dial; the agent answers in voice, handles hours /
location / insurance / services questions, and **captures appointment requests** for
staff callback. All AI runs **locally on this box**:

```
caller ─▶ Twilio ─▶ wss (Traefik TLS) ─▶ server.py ─▶ Pipecat pipeline:
          Twilio Media Stream (8kHz µ-law)
              │
              ▼
   Silero VAD ─▶ Whisper STT (GPU) ─▶ activeblue-avc (Ollama) ─▶ Kokoro TTS ─▶ back to caller
```

Inbound only. No cloud STT/TTS — audio stays on the machine except the Twilio carrier leg.

## Files
| File | Role |
|---|---|
| `server.py` | FastAPI: `POST /voice` (TwiML) + `WS /ws` (Twilio Media Stream) |
| `bot.py` | The per-call Pipecat pipeline (VAD→STT→LLM→TTS) + tool wiring |
| `practice.py` | **AVC business facts (PLACEHOLDERS — edit before go-live)** + appointment-capture tool |
| `odoo_client.py` | Writes captured requests into Odoo (CRM lead by default) via XML-RPC |
| `run.sh` | Launcher (reuses pipecat-run venv + sets CUDA lib path) |
| `avc-phone.service` | systemd unit (install on this box) |
| `deploy/setup-tls.sh` | One-shot: Let's Encrypt cert + nginx vhost install (run as root) |
| `deploy/nginx-*.conf` | nginx TLS reverse-proxy vhost + WebSocket-upgrade map |
| `traefik-avc-phone.yml` | Unused alternative (kept for a future multi-host/Traefik setup) |
| `.env.example` | Copy to `.env`, fill Twilio creds + public host + Odoo creds |
| `appointment_requests.jsonl` | Local fallback — only used if Odoo is unreachable/disabled |

## What's done vs. what YOU must supply

**Working / verified locally:**
- Pipeline assembles; all services construct (smoke-tested).
- GPU Whisper fixed — installed CUDA12 `cublas`+`cudnn` wheels into the venv; `run.sh`
  sets `LD_LIBRARY_PATH` so faster-whisper finds them. Verified transcribe on GPU.
- Local model `activeblue-avc:latest` is the brain; Kokoro voice; appointment tool.
- **Odoo appointment integration wired + verified** against prod `db1`: a captured
  request creates a `crm.lead` (callback to-do) via XML-RPC using the same API key the
  `activeblue-agent` service uses. Verified create→read→delete (no residue left in db1).
  If Odoo is unreachable or creds are blank, it falls back to `appointment_requests.jsonl`
  and still confirms to the caller — a request is never lost.

**You must supply (can't be done from this box):**
1. **Twilio account + a Voice phone number.**
2. **Port-forward 443** (and 80) from your router to this box, and run `deploy/setup-tls.sh`
   for the nginx TLS reverse proxy (Twilio needs real TLS on 443 for the `wss` stream).
3. **Real AVC facts** in `practice.py` (hours, address, insurance, services, phone).
4. **Odoo creds in `.env`** (`ODOO_USER` + `ODOO_API_KEY`) to enable lead creation.
   Set `ODOO_DB` (`db1` for prod) and `ODOO_TARGET` (`crm` lead, or `calendar` event).
   Leave creds blank to disable Odoo and log to JSONL only.

## Setup

1. **Config**
   ```bash
   cd /home/tocmo0nlord/avc-phone
   cp .env.example .env        # fill PUBLIC_HOST, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
   $EDITOR practice.py         # replace PLACEHOLDER hours/address/insurance/services
   ```

2. **Run it**
   ```bash
   ./run.sh                    # listens plain HTTP on :8200 (Traefik terminates TLS)
   curl localhost:8200/health  # {"status":"ok",...}
   ```

3. **TLS reverse proxy (nginx, on this box).** No Traefik — `voip.activeblue.net` points
   at your WAN IP (`66.23.239.222`) which NATs to this box (`10.10.1.221`). nginx is
   already installed and only serving the default site, so we add a vhost for the domain.
   **Twilio's `wss` media stream needs real TLS on 443**, so:
   - **Forward `443` (and `80`) on your router → `10.10.1.221`.** (80 is for the
     Let's Encrypt challenge + the http→https redirect; 443 is the actual traffic.)
   - Run the one-shot setup (gets a Let's Encrypt cert, installs the vhost + ws map,
     reloads nginx):
     ```bash
     sudo bash deploy/setup-tls.sh
     ```
     It uses `deploy/nginx-voip.activeblue.net.conf` (proxies 443 → `127.0.0.1:8200`,
     forwards the `/ws` upgrade, 1-hour stream timeout) and `deploy/nginx-ws-upgrade.conf`.
   - Verify publicly: `curl https://voip.activeblue.net/health`.

4. **Twilio number config** (console.twilio.com → your number → Voice):
   - **A call comes in** → Webhook → `https://voip.activeblue.net/voice` → HTTP **POST**.
   - Save. That's it — the TwiML we return tells Twilio to open the Media Stream to
     `wss://voip.activeblue.net/ws`.

5. **Call the number.** You should hear the greeting and be able to talk to it.

## Security (built in)
- **Webhook signature validation:** `POST /voice` verifies Twilio's `X-Twilio-Signature`
  (HMAC-SHA1 over the public URL + sorted POST params, keyed by `TWILIO_AUTH_TOKEN`).
  Enforced automatically whenever `TWILIO_AUTH_TOKEN` is set. Verified against Twilio's
  published reference vector. Unsigned/forged requests get `403`. Set `TWILIO_VALIDATE=false`
  only for local testing.
  - The signed URL must match exactly, so **`PUBLIC_HOST` must equal the host on the number's
    webhook** (`https://$PUBLIC_HOST/voice`). If Traefik rewrites host/path, signatures fail.
- **Media-stream gate:** `/ws` can't carry a usable Twilio signature, so it's gated by a
  shared `STREAM_TOKEN` embedded in the wss URL we hand Twilio. Bad/missing token → socket
  closed. Set a stable `STREAM_TOKEN` in `.env` (`openssl rand -base64 24`).

## Run it as a service (systemd)
A unit is provided: `avc-phone.service` (runs as your user, `Restart=always`, ordered
after `ollama.service`). Install (needs sudo — paste these in a `!` shell or a terminal):
```bash
sudo cp /home/tocmo0nlord/avc-phone/avc-phone.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now avc-phone.service
systemctl status avc-phone.service          # check it's running
journalctl -u avc-phone.service -f          # follow logs
```
Restart after editing `.env` or `practice.py`: `sudo systemctl restart avc-phone.service`.
(No-sudo alternative: a `systemctl --user` unit + `loginctl enable-linger tocmo0nlord` —
ask and I'll convert it.)

## Concurrency cap (built in)
`MAX_CONCURRENT_CALLS` (default **2**) bounds simultaneous live calls. The count tracks
active `/ws` pipelines (the real GPU consumers); when full, `/voice` speaks `BUSY_MESSAGE`
and hangs up **before any GPU work**, so in-progress calls are never degraded. A hard
reservation at `/ws` covers the rare race. `/health` reports `active_calls`/`max_calls`
for monitoring. Tune the cap to your GPU headroom.

## Known limits / next steps
- **Per-call Whisper load:** each call currently constructs its own Whisper model on the
  GPU. Fine within the cap; a future optimization is sharing one warm Whisper instance
  across calls to cut memory + first-utterance latency.
- **Latency:** first call after start pays one-time model loads (Whisper/Kokoro/Ollama).
  Keep the process warm. Tune `WHISPER_MODEL=tiny` if you need faster STT.
- **Function-calling reliability:** `activeblue-avc` is an 8B fine-tune; tool-calling
  may need prompt tuning. If it's flaky, we can fall back to a deterministic slot-filling
  flow for appointment capture.