Initial commit: avc-phone-ai codebase + CLAUDE.md

This commit is contained in:
tocmo0nlord
2026-06-23 22:38:22 +00:00
parent 4bf72b9616
commit c3c719b77e
16 changed files with 1491 additions and 0 deletions

62
.env.example Normal file
View File

@@ -0,0 +1,62 @@
# Copy to .env and fill in. run.sh auto-loads it.
# ── Public ingress (Twilio dials this back) ──────────────────────────────────
# Public hostname; nginx terminates TLS here and proxies to the app. Must match the
# Twilio webhook host (Twilio signs https://PUBLIC_HOST/voice).
PUBLIC_HOST=voip.activeblue.net
PORT=8200
# App bind address. Default 127.0.0.1 (nginx proxies in locally) — not exposed on LAN.
BIND_HOST=127.0.0.1
# ── Twilio ───────────────────────────────────────────────────────────────────
# From console.twilio.com. Account SID + a Standard API Key (scoped to this app,
# revocable independently). The Auth Token stays in the Twilio console only — never on
# this server. Create the key under Account → API Keys → Create Standard key, name it
# avc-phone-agent-prod; the Secret is shown once. Used to auto-hang-up the carrier leg
# and validate inbound webhook signatures.
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_API_KEY_SID=SKxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_API_KEY_SECRET=your_api_key_secret_here
# Inbound webhook signature validation is ON whenever TWILIO_API_KEY_SECRET is set.
# Set to false only for local testing without real Twilio requests.
TWILIO_VALIDATE=true
# Shared secret embedded in the Media Stream wss URL to gate /ws. Set a stable random
# value (e.g. `openssl rand -base64 24`); if blank, one is generated per process start.
STREAM_TOKEN=
# ── Odoo appointment integration ─────────────────────────────────────────────
# Leave ODOO_USER/ODOO_API_KEY blank to disable Odoo and log requests to JSONL only.
# Same creds the activeblue-agent container uses (docker inspect activeblue-agent).
# Verified working against db1 with ODOO_TARGET=crm.
ODOO_URL=http://localhost:8069
ODOO_DB=db1
ODOO_USER=mr.garcia09@gmail.com
ODOO_API_KEY=
ODOO_TARGET=crm # crm = callback lead (recommended) | calendar = tentative event
# ── Capacity ─────────────────────────────────────────────────────────────────
# Max simultaneous calls (each uses GPU; Ollama serializes generation). Over-cap
# callers hear BUSY_MESSAGE and are hung up. Tune to your GPU headroom (2-3 typical).
MAX_CONCURRENT_CALLS=2
# BUSY_MESSAGE=Thank you for calling Advanced Vision Care. All of our lines are busy right now. Please call back in a few minutes. Goodbye.
# ── Models (defaults are fine) ───────────────────────────────────────────────
OLLAMA_MODEL=llama3.1:8b
OLLAMA_URL=http://127.0.0.1:11434/v1
# LLM provider: ollama (local, default) | anthropic (Claude API). Flip to A/B test Claude.
LLM_PROVIDER=ollama
ANTHROPIC_API_KEY=
# Default is the most capable model; for low-latency phone voice prefer claude-haiku-4-5
# (fastest) or claude-sonnet-4-6 (balance).
ANTHROPIC_MODEL=claude-opus-4-8
# ── STT: Deepgram (real-time, in-call only) ──────────────────────────────────
# Nova-2 delivers end-of-utterance in <300ms (vs Whisper's 1-3s buffering). Key from
# console.deepgram.com. Model is fixed to nova-2 in code; DEEPGRAM_MODEL is informational.
DEEPGRAM_API_KEY=
DEEPGRAM_MODEL=nova-2
# Whisper is retained for POST-CALL transcription only (Phase 3), not the live pipeline.
WHISPER_MODEL=base
WHISPER_DEVICE=cuda
WHISPER_COMPUTE=float16
KOKORO_VOICE=af_heart
KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models

19
.gitignore vendored Normal file
View File

@@ -0,0 +1,19 @@
# Secrets — never commit
.env
# Recordings (local only, may contain PHI)
recordings/
# Model weights
*.gguf
# Python
__pycache__/
*.py[cod]
*.egg-info/
.venv/
venv/
# OS / editor
.DS_Store
*.swp

132
README.md Normal file
View File

@@ -0,0 +1,132 @@
# AVC Phone Agent — inbound optometry line (Pipecat + Twilio, fully local)
A real phone number that callers dial; the agent answers in voice, handles hours /
location / insurance / services questions, and **captures appointment requests** for
staff callback. All AI runs **locally on this box**:
```
caller ─▶ Twilio ─▶ wss (Traefik TLS) ─▶ server.py ─▶ Pipecat pipeline:
Twilio Media Stream (8kHz µ-law)
Silero VAD ─▶ Whisper STT (GPU) ─▶ activeblue-avc (Ollama) ─▶ Kokoro TTS ─▶ back to caller
```
Inbound only. No cloud STT/TTS — audio stays on the machine except the Twilio carrier leg.
## Files
| File | Role |
|---|---|
| `server.py` | FastAPI: `POST /voice` (TwiML) + `WS /ws` (Twilio Media Stream) |
| `bot.py` | The per-call Pipecat pipeline (VAD→STT→LLM→TTS) + tool wiring |
| `practice.py` | **AVC business facts (PLACEHOLDERS — edit before go-live)** + appointment-capture tool |
| `odoo_client.py` | Writes captured requests into Odoo (CRM lead by default) via XML-RPC |
| `run.sh` | Launcher (reuses pipecat-run venv + sets CUDA lib path) |
| `avc-phone.service` | systemd unit (install on this box) |
| `deploy/setup-tls.sh` | One-shot: Let's Encrypt cert + nginx vhost install (run as root) |
| `deploy/nginx-*.conf` | nginx TLS reverse-proxy vhost + WebSocket-upgrade map |
| `traefik-avc-phone.yml` | Unused alternative (kept for a future multi-host/Traefik setup) |
| `.env.example` | Copy to `.env`, fill Twilio creds + public host + Odoo creds |
| `appointment_requests.jsonl` | Local fallback — only used if Odoo is unreachable/disabled |
## What's done vs. what YOU must supply
**Working / verified locally:**
- Pipeline assembles; all services construct (smoke-tested).
- GPU Whisper fixed — installed CUDA12 `cublas`+`cudnn` wheels into the venv; `run.sh`
sets `LD_LIBRARY_PATH` so faster-whisper finds them. Verified transcribe on GPU.
- Local model `activeblue-avc:latest` is the brain; Kokoro voice; appointment tool.
- **Odoo appointment integration wired + verified** against prod `db1`: a captured
request creates a `crm.lead` (callback to-do) via XML-RPC using the same API key the
`activeblue-agent` service uses. Verified create→read→delete (no residue left in db1).
If Odoo is unreachable or creds are blank, it falls back to `appointment_requests.jsonl`
and still confirms to the caller — a request is never lost.
**You must supply (can't be done from this box):**
1. **Twilio account + a Voice phone number.**
2. **Port-forward 443** (and 80) from your router to this box, and run `deploy/setup-tls.sh`
for the nginx TLS reverse proxy (Twilio needs real TLS on 443 for the `wss` stream).
3. **Real AVC facts** in `practice.py` (hours, address, insurance, services, phone).
4. **Odoo creds in `.env`** (`ODOO_USER` + `ODOO_API_KEY`) to enable lead creation.
Set `ODOO_DB` (`db1` for prod) and `ODOO_TARGET` (`crm` lead, or `calendar` event).
Leave creds blank to disable Odoo and log to JSONL only.
## Setup
1. **Config**
```bash
cd /home/tocmo0nlord/avc-phone
cp .env.example .env # fill PUBLIC_HOST, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
$EDITOR practice.py # replace PLACEHOLDER hours/address/insurance/services
```
2. **Run it**
```bash
./run.sh # listens plain HTTP on :8200 (Traefik terminates TLS)
curl localhost:8200/health # {"status":"ok",...}
```
3. **TLS reverse proxy (nginx, on this box).** No Traefik — `voip.activeblue.net` points
at your WAN IP (`66.23.239.222`) which NATs to this box (`10.10.1.221`). nginx is
already installed and only serving the default site, so we add a vhost for the domain.
**Twilio's `wss` media stream needs real TLS on 443**, so:
- **Forward `443` (and `80`) on your router → `10.10.1.221`.** (80 is for the
Let's Encrypt challenge + the http→https redirect; 443 is the actual traffic.)
- Run the one-shot setup (gets a Let's Encrypt cert, installs the vhost + ws map,
reloads nginx):
```bash
sudo bash deploy/setup-tls.sh
```
It uses `deploy/nginx-voip.activeblue.net.conf` (proxies 443 → `127.0.0.1:8200`,
forwards the `/ws` upgrade, 1-hour stream timeout) and `deploy/nginx-ws-upgrade.conf`.
- Verify publicly: `curl https://voip.activeblue.net/health`.
4. **Twilio number config** (console.twilio.com → your number → Voice):
- **A call comes in** → Webhook → `https://voip.activeblue.net/voice` → HTTP **POST**.
- Save. That's it — the TwiML we return tells Twilio to open the Media Stream to
`wss://voip.activeblue.net/ws`.
5. **Call the number.** You should hear the greeting and be able to talk to it.
## Security (built in)
- **Webhook signature validation:** `POST /voice` verifies Twilio's `X-Twilio-Signature`
(HMAC-SHA1 over the public URL + sorted POST params, keyed by `TWILIO_AUTH_TOKEN`).
Enforced automatically whenever `TWILIO_AUTH_TOKEN` is set. Verified against Twilio's
published reference vector. Unsigned/forged requests get `403`. Set `TWILIO_VALIDATE=false`
only for local testing.
- The signed URL must match exactly, so **`PUBLIC_HOST` must equal the host on the number's
webhook** (`https://$PUBLIC_HOST/voice`). If Traefik rewrites host/path, signatures fail.
- **Media-stream gate:** `/ws` can't carry a usable Twilio signature, so it's gated by a
shared `STREAM_TOKEN` embedded in the wss URL we hand Twilio. Bad/missing token → socket
closed. Set a stable `STREAM_TOKEN` in `.env` (`openssl rand -base64 24`).
## Run it as a service (systemd)
A unit is provided: `avc-phone.service` (runs as your user, `Restart=always`, ordered
after `ollama.service`). Install (needs sudo — paste these in a `!` shell or a terminal):
```bash
sudo cp /home/tocmo0nlord/avc-phone/avc-phone.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now avc-phone.service
systemctl status avc-phone.service # check it's running
journalctl -u avc-phone.service -f # follow logs
```
Restart after editing `.env` or `practice.py`: `sudo systemctl restart avc-phone.service`.
(No-sudo alternative: a `systemctl --user` unit + `loginctl enable-linger tocmo0nlord` —
ask and I'll convert it.)
## Concurrency cap (built in)
`MAX_CONCURRENT_CALLS` (default **2**) bounds simultaneous live calls. The count tracks
active `/ws` pipelines (the real GPU consumers); when full, `/voice` speaks `BUSY_MESSAGE`
and hangs up **before any GPU work**, so in-progress calls are never degraded. A hard
reservation at `/ws` covers the rare race. `/health` reports `active_calls`/`max_calls`
for monitoring. Tune the cap to your GPU headroom.
## Known limits / next steps
- **Per-call Whisper load:** each call currently constructs its own Whisper model on the
GPU. Fine within the cap; a future optimization is sharing one warm Whisper instance
across calls to cut memory + first-utterance latency.
- **Latency:** first call after start pays one-time model loads (Whisper/Kokoro/Ollama).
Keep the process warm. Tune `WHISPER_MODEL=tiny` if you need faster STT.
- **Function-calling reliability:** `activeblue-avc` is an 8B fine-tune; tool-calling
may need prompt tuning. If it's flaky, we can fall back to a deterministic slot-filling
flow for appointment capture.

34
avc-phone.service Normal file
View File

@@ -0,0 +1,34 @@
[Unit]
Description=AVC optometry phone agent (Pipecat + Twilio, local GPU)
Documentation=file:///home/tocmo0nlord/avc-phone/README.md
# Needs the network up and Ollama serving the activeblue-avc model. Docker (Odoo) is
# only a soft dependency — the agent falls back to JSONL if Odoo is down — so it is
# ordered after but not required.
Wants=network-online.target
After=network-online.target ollama.service docker.service
# Restart rate limit (these are [Unit]-level directives).
StartLimitIntervalSec=300
StartLimitBurst=5
[Service]
Type=simple
User=tocmo0nlord
Group=tocmo0nlord
WorkingDirectory=/home/tocmo0nlord/avc-phone
# run.sh loads .env, sets LD_LIBRARY_PATH for the CUDA libs, and execs the server.
ExecStart=/home/tocmo0nlord/avc-phone/run.sh
# A phone line should always come back up.
Restart=always
RestartSec=5
# Give model loads time before a failed start counts against the restart limit.
TimeoutStartSec=120
# Light hardening. Home is left writable on purpose: the app reads the shared venv +
# Kokoro models + HF Whisper cache under /home and writes the JSONL fallback, so a
# read-only home would risk breaking model-cache locks. /usr,/etc stay read-only.
NoNewPrivileges=true
ProtectSystem=full
PrivateTmp=true
[Install]
WantedBy=multi-user.target

382
bot.py Normal file
View File

@@ -0,0 +1,382 @@
#!/usr/bin/env python3
"""AVC optometry phone agent — the Pipecat pipeline for a single inbound call.
Same VAD -> STT -> LLM -> TTS loop as pipecat-run/bot.py, but the ends are swapped
for telephony: audio arrives/leaves as 8 kHz mu-law over a Twilio Media Stream
(WebSocket), decoded by TwilioFrameSerializer. STT runs on the GPU; the LLM is the
local `activeblue-avc` fine-tune via Ollama; TTS is local Kokoro.
This module just builds + runs the pipeline for one connected call. server.py owns
the FastAPI/TwiML/WebSocket side and calls run_call() once per call.
"""
import os
import re
import time
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStoppedSpeakingFrame,
EndFrame,
EndTaskFrame,
Frame,
InputAudioRawFrame,
LLMFullResponseEndFrame,
LLMTextFrame,
TTSSpeakFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.kokoro.tts import KokoroTTSService
from pipecat.services.ollama.llm import OLLamaLLMService
from pipecat.transports.websocket.fastapi import (
FastAPIWebsocketParams,
FastAPIWebsocketTransport,
)
from practice import practice_summary
# ── Config (env-overridable) ─────────────────────────────────────────────────
HERE = os.path.dirname(os.path.abspath(__file__))
# Reuse the Kokoro model files already downloaded by the pipecat-run project.
MODEL_DIR = os.environ.get("KOKORO_MODEL_DIR", "/home/tocmo0nlord/pipecat-run/models")
OLLAMA_MODEL = os.environ.get("OLLAMA_MODEL", "activeblue-avc:latest")
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://127.0.0.1:11434/v1")
# Swappable LLM provider: "ollama" (local) or "anthropic" (Claude API). Same universal
# LLMContext drives both — only the service construction differs (see build_llm_service).
LLM_PROVIDER = os.environ.get("LLM_PROVIDER", "ollama").lower()
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", "")
# Defaults to the most capable model. For low-latency PHONE voice, set ANTHROPIC_MODEL to
# claude-haiku-4-5 (fastest) or claude-sonnet-4-6 (balance) — see notes in build_llm_service.
ANTHROPIC_MODEL = os.environ.get("ANTHROPIC_MODEL", "claude-opus-4-8")
# In-call function-calling: AUTO by provider — ON for Claude (reliable tool calls → real-time
# Odoo booking), OFF for local Ollama (llama3.1:8b over-calls / leaks JSON). An explicit
# ENABLE_TOOLS env overrides the auto choice either way.
_enable_tools_env = os.environ.get("ENABLE_TOOLS")
ENABLE_TOOLS = (
_enable_tools_env.lower() in ("1", "true", "yes")
if _enable_tools_env is not None
else (LLM_PROVIDER == "anthropic")
)
LLM_TEMPERATURE = float(os.environ.get("LLM_TEMPERATURE", "0.3"))
LLM_MAX_TOKENS = int(os.environ.get("LLM_MAX_TOKENS", "160"))
KOKORO_VOICE = os.environ.get("KOKORO_VOICE", "af_heart")
# Real-time STT is Deepgram Nova-2: end-of-utterance events in <300ms (vs Whisper's
# 1-3s of chunk buffering, the main cause of non-reply / repeat-yourself). Whisper
# large-v3 is retained for post-call transcription only (Phase 3).
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "")
# Twilio sends 8 kHz mu-law on the wire — we run the PIPELINE at 16 kHz and let
# TwilioFrameSerializer resample to/from the 8 kHz wire. (Silero VAD, Deepgram, and
# Kokoro are all happy at 16 kHz.)
WIRE_SAMPLE_RATE = 8000 # Twilio mu-law on the wire (serializer handles this)
PIPELINE_SAMPLE_RATE = 16000 # internal rate Whisper/VAD actually need
# VAD tuning. Defaults (confidence 0.7 / min_volume 0.6) are desktop-mic values that can
# miss short/quiet 8 kHz telephony utterances like "yes" — loosen them for the phone.
VAD_CONFIDENCE = float(os.environ.get("VAD_CONFIDENCE", "0.5"))
VAD_MIN_VOLUME = float(os.environ.get("VAD_MIN_VOLUME", "0.3"))
VAD_START_SECS = float(os.environ.get("VAD_START_SECS", "0.2"))
VAD_STOP_SECS = float(os.environ.get("VAD_STOP_SECS", "0.5"))
# Agent persona name — purely for warmth; change/remove freely.
AGENT_NAME = os.environ.get("AGENT_NAME", "Sofia")
SYSTEM_PROMPT = (
f"You are {AGENT_NAME}, a warm, friendly receptionist for Advanced Vision Care, an "
"optometry practice with eight offices in South Florida. You are on a real phone call, so "
"talk like a helpful human being: natural, relaxed, and genuinely conversational — usually "
"just one short sentence at a time. Speak in English. Say numbers, dates, and times as "
"words a person would say.\n\n"
"Your job is to answer callers' questions and to take appointment requests. To book a "
"visit you need four things: which office or city, the reason for the visit, a preferred "
"day and time, and their name. Gather these naturally as the conversation flows — don't "
"interrogate, and never ask for something the caller already told you (people often give "
"their name or reason in their first sentence). You already have their number from caller "
"ID, so never ask for a phone number. When you have the details, repeat them back in one "
"warm sentence to confirm, and let them know a staff member will call to finalize the time.\n\n"
"Stay truthful and within your limits:\n"
"- Use ONLY the facts below for addresses, phone numbers, insurance, and services. Never "
"make any of these up.\n"
"- To find the right office, ask what CITY or AREA is most convenient for the caller. Do "
"NOT suggest or name a specific office yourself — you don't know where they are. Only after "
"they tell you their area, name the matching office; and only list locations if they ask "
"what offices exist.\n"
"- You cannot see a calendar, so never say a time is open or available — take the time as "
"a request that staff will confirm.\n"
"- Insurance: only confirm a plan that is in the list below. For any plan that is not "
"listed (UnitedHealthcare, Aetna, Cigna, and so on), don't say yes or no — say our staff "
"will verify their coverage.\n"
"- Hours are not published — say they vary by office and staff will confirm; never give "
"specific hours.\n"
"- You don't give medical advice and can't transfer calls. If the caller mentions an eye "
"problem, just note it as the reason and say a staff member or doctor will follow up.\n"
"- If you're not sure you heard something, simply ask them to repeat it.\n"
"- When the caller is all set, give a brief, warm closing that ends with the word "
"'Goodbye' — that ends the call, so only say it when you truly mean to.\n\n"
"PRACTICE FACTS:\n" + practice_summary()
)
def _build_tools() -> ToolsSchema:
# Only the booking action is a tool. Practice facts already live in the system prompt,
# so no get_practice_info tool (avoids needless calls/latency). callback_number is NOT
# required — we have the caller-ID and inject it in the handler.
return ToolsSchema(
standard_tools=[
FunctionSchema(
name="record_appointment_request",
description=(
"Record the caller's appointment request once you have their name and at "
"least the office/city and reason. Call this when the caller wants to book "
"a visit; staff will call back to confirm the exact time."
),
properties={
"patient_name": {"type": "string", "description": "Caller's full name"},
"location": {"type": "string", "description": "Which office/city the caller wants, e.g. Hialeah, Kendall, Tamarac"},
"reason": {"type": "string", "description": "Reason for the visit, e.g. annual exam, broken glasses, eye pain"},
"preferred_time": {"type": "string", "description": "Preferred day/time in the caller's words, if given"},
},
required=["patient_name"],
),
]
)
class EndCallProcessor(FrameProcessor):
"""Lets Sofia hang up. MUST sit between the LLM and the TTS: there it sees her reply
text (LLMTextFrame, flowing downstream) AND the upstream copy of BotStoppedSpeakingFrame
the output transport emits. It accumulates each reply; if the finished reply contains a
closing ('goodbye'/'adiós'), it waits until she's done speaking, then pushes EndTaskFrame
upstream — the task ends and TwilioFrameSerializer (auto_hang_up) drops the call."""
_CLOSINGS = ("goodbye", "good-bye", "good bye", "adiós", "adios", "hasta luego")
def __init__(self):
super().__init__()
self._buf = ""
self._should_end = False
@classmethod
def _is_closing(cls, text: str) -> bool:
t = (text or "").lower()
return any(c in t for c in cls._CLOSINGS)
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMTextFrame):
self._buf += frame.text
elif isinstance(frame, LLMFullResponseEndFrame):
if self._is_closing(self._buf):
self._should_end = True
logger.info("Sofia signalled closing -- will hang up after she finishes speaking")
self._buf = ""
elif isinstance(frame, BotStoppedSpeakingFrame) and self._should_end:
self._should_end = False
logger.info("Sofia closed the call -- ending task / hanging up")
await self.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
class AudioHeartbeat(FrameProcessor):
"""Diagnostic: logs how many inbound audio frames arrive every ~5s. If this keeps
ticking but VAD never fires, the issue is VAD/threshold; if it drops to 0 after a
turn, inbound audio stalled at the transport. Cheap, leave it on while stabilizing."""
def __init__(self):
super().__init__()
self._n = 0
self._t = time.time()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame):
self._n += 1
now = time.time()
if now - self._t >= 5:
logger.info(f"[audio-in] {self._n} frames in last {now - self._t:.0f}s")
self._n = 0
self._t = now
await self.push_frame(frame, direction)
def build_llm_service():
"""Build the LLM service for the selected provider. The universal LLMContext +
aggregators work with either, so only this construction differs (true A/B swap)."""
if LLM_PROVIDER == "anthropic":
if not ANTHROPIC_API_KEY:
raise RuntimeError("LLM_PROVIDER=anthropic but ANTHROPIC_API_KEY is not set")
logger.info(f"LLM provider: anthropic ({ANTHROPIC_MODEL})")
# NOTE: Opus 4.8/4.7 reject temperature/top_p/top_k (HTTP 400), so we omit them —
# this keeps the default Opus model working. For low-latency phone voice, prefer
# claude-haiku-4-5 (fastest) or claude-sonnet-4-6 over Opus. enable_prompt_caching
# caches the system prompt + growing conversation prefix (helps multi-turn cost/latency).
return AnthropicLLMService(
api_key=ANTHROPIC_API_KEY,
settings=AnthropicLLMService.Settings(
model=ANTHROPIC_MODEL,
enable_prompt_caching=True,
max_tokens=LLM_MAX_TOKENS,
),
)
logger.info(f"LLM provider: ollama ({OLLAMA_MODEL})")
return OLLamaLLMService(
settings=OLLamaLLMService.Settings(
model=OLLAMA_MODEL,
temperature=LLM_TEMPERATURE,
max_tokens=LLM_MAX_TOKENS,
),
base_url=OLLAMA_URL,
)
async def run_agent(transport, caller_number=None, call_sid=None, do_capture=True):
"""Build + run the AVC voice agent on a given transport. Shared by the phone path
(Twilio Media Stream) and the browser path (WebRTC) — same prompt, model, voice, and
booking/hang-up logic; only the transport differs. do_capture writes the post-call
appointment to Odoo (on for phone; off for browser testing so it doesn't make cards)."""
stt = DeepgramSTTService(
api_key=DEEPGRAM_API_KEY,
settings=DeepgramSTTService.Settings(
model="nova-2",
language="en-US",
smart_format=True,
punctuate=True,
interim_results=False, # final transcripts only — avoids double-firing
utterance_end_ms=1000, # ms of silence before end-of-utterance fires
),
)
llm = build_llm_service()
# In-call booking tool — only registered when ENABLE_TOOLS is on (auto: Claude yes,
# local Ollama no, since llama3.1:8b over-calls/leaks). The handler is a closure so it
# can stamp the verified caller-ID + call_sid onto the lead (the model never supplies a
# phone number — we don't ask for one). With tools on, this writes the Odoo lead IN-CALL,
# so the post-call extraction is skipped below to avoid a duplicate.
if ENABLE_TOOLS:
async def _record_appointment(params):
args = params.arguments or {}
if do_capture:
from practice import persist_appointment
persist_appointment({
"call_sid": call_sid,
"patient_name": args.get("patient_name"),
"callback_number": caller_number, # verified caller-ID, not model-supplied
"location": args.get("location"),
"reason": args.get("reason"),
"preferred_time": args.get("preferred_time"),
"source": "in_call_tool",
})
else:
logger.info(f"[capture off] would record appointment: {args.get('patient_name')} / {args.get('location')}")
await params.result_callback(
{"status": "recorded", "message": "Recorded — staff will call to confirm the time."}
)
llm.register_function("record_appointment_request", _record_appointment)
tts = KokoroTTSService(
model_path=os.path.join(MODEL_DIR, "kokoro-v1.0.onnx"),
voices_path=os.path.join(MODEL_DIR, "voices-v1.0.bin"),
settings=KokoroTTSService.Settings(voice=KOKORO_VOICE),
)
vad = VADProcessor(vad_analyzer=SileroVADAnalyzer(params=VADParams(
confidence=VAD_CONFIDENCE,
start_secs=VAD_START_SECS,
stop_secs=VAD_STOP_SECS,
min_volume=VAD_MIN_VOLUME,
)))
heartbeat = AudioHeartbeat()
context_kwargs = {"messages": [{"role": "system", "content": SYSTEM_PROMPT}]}
if ENABLE_TOOLS:
context_kwargs["tools"] = _build_tools()
context = LLMContext(**context_kwargs)
agg = LLMContextAggregatorPair(context)
endcall = EndCallProcessor()
pipeline = Pipeline(
[
transport.input(),
heartbeat,
vad,
stt,
agg.user(),
llm,
endcall,
tts,
transport.output(),
agg.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=PIPELINE_SAMPLE_RATE,
audio_out_sample_rate=PIPELINE_SAMPLE_RATE,
allow_interruptions=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected -- greeting")
await task.queue_frames(
[TTSSpeakFrame(
f"Thank you for calling Advanced Vision Care, this is {AGENT_NAME}. "
"How can I help you today?"
)]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected -- ending task")
await task.queue_frame(EndFrame())
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
# Call is over. Post-call extraction is the capture path ONLY when in-call tools are
# off (local Ollama). With tools on (Claude), the booking was already written in-call,
# so skip extraction to avoid a duplicate lead.
if do_capture and not ENABLE_TOOLS:
try:
from extract import extract_and_record
await extract_and_record(
context.messages, OLLAMA_URL, OLLAMA_MODEL,
call_sid=call_sid, caller_number=caller_number,
)
except Exception:
logger.exception("Post-call appointment extraction failed")
async def run_call(websocket, serializer: TwilioFrameSerializer, caller_number=None, call_sid=None):
"""Phone entrypoint: wrap the Twilio Media Stream in a transport, run the shared agent."""
transport = FastAPIWebsocketTransport(
websocket=websocket,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_sample_rate=PIPELINE_SAMPLE_RATE,
audio_out_sample_rate=PIPELINE_SAMPLE_RATE,
add_wav_header=False,
serializer=serializer,
),
)
await run_agent(transport, caller_number=caller_number, call_sid=call_sid, do_capture=True)

63
bot_web.py Normal file
View File

@@ -0,0 +1,63 @@
#!/usr/bin/env python3
"""Browser entrypoint for the SAME Sofia agent — for fast iteration without phoning in.
Reuses bot.run_agent (identical prompt, model, voice, booking + hang-up logic) but over a
browser WebRTC transport via Pipecat's dev runner. Serves a mic UI at http://localhost:7860.
Caveat: this path is 16 kHz WebRTC, NOT 8 kHz telephony — great for testing conversation /
prompt / voice / hang-up, but it does NOT reproduce phone-specific audio (µ-law, clipping,
VAD-on-8kHz). Appointment capture to Odoo is OFF here (do_capture=False) so browser tests
don't create CRM cards.
Run:
./run_web.sh # then open http://localhost:7860, click Connect, allow mic, talk
"""
from pipecat.runner.utils import create_transport
from pipecat.transports.base_transport import TransportParams
from bot import run_agent
import os
# Browser tests don't write to Odoo by default (keeps test bookings out of the CRM). Set
# WEB_ALLOW_CAPTURE=true for a one-off test that actually creates the lead.
WEB_ALLOW_CAPTURE = os.environ.get("WEB_ALLOW_CAPTURE", "false").lower() in ("1", "true", "yes")
async def bot(runner_args):
"""Called by pipecat.runner.run.main for each browser connection."""
transport = await create_transport(
runner_args,
{"webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True)},
)
await run_agent(transport, do_capture=WEB_ALLOW_CAPTURE)
if __name__ == "__main__":
import os
import sys
import uvicorn
# Serve on the LAN over HTTPS so the mic works from another machine (browsers require
# a secure context off-localhost). Reuses the self-signed certs (valid for 10.10.1.221).
CERT = os.environ.get("WEB_CERT", "/home/tocmo0nlord/pipecat-run/certs/cert.pem")
KEY = os.environ.get("WEB_KEY", "/home/tocmo0nlord/pipecat-run/certs/key.pem")
if os.path.exists(CERT) and os.path.exists(KEY):
_orig_run = uvicorn.run
def _run_with_tls(app, *args, **kwargs):
kwargs.setdefault("ssl_certfile", CERT)
kwargs.setdefault("ssl_keyfile", KEY)
return _orig_run(app, *args, **kwargs)
uvicorn.run = _run_with_tls
if "--host" not in sys.argv:
sys.argv += ["--host", "0.0.0.0"]
print("Browser UI: https://10.10.1.221:7860 (accept the self-signed cert once)")
from pipecat.runner.run import main
main()

View File

@@ -0,0 +1,58 @@
# nginx reverse proxy for the AVC phone agent.
# Terminates TLS for voip.activeblue.net and proxies to the app on 127.0.0.1:8200,
# forwarding the /ws WebSocket (Twilio Media Stream) with long timeouts.
#
# Install:
# sudo cp deploy/nginx-voip.activeblue.net.conf /etc/nginx/sites-available/voip.activeblue.net
# sudo cp deploy/nginx-ws-upgrade.conf /etc/nginx/conf.d/ws-upgrade.conf
# sudo ln -s /etc/nginx/sites-available/voip.activeblue.net /etc/nginx/sites-enabled/
# sudo nginx -t && sudo systemctl reload nginx
# (Get the cert FIRST — see README — or the 443 block fails to load.)
# ── HTTP :80 — ACME challenge + redirect everything else to HTTPS ─────────────
server {
listen 80;
listen [::]:80;
server_name voip.activeblue.net;
# Let's Encrypt HTTP-01 webroot challenge (served, not redirected).
location /.well-known/acme-challenge/ {
root /var/www/html;
}
location / {
return 301 https://$host$request_uri;
}
}
# ── HTTPS :443 — TLS termination + proxy to the app ──────────────────────────
server {
listen 443 ssl;
listen [::]:443 ssl;
http2 on;
server_name voip.activeblue.net;
ssl_certificate /etc/letsencrypt/live/voip.activeblue.net/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/voip.activeblue.net/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
# Twilio Media Streams hold the WebSocket open for the whole call — allow it.
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
location / {
proxy_pass http://127.0.0.1:8200;
proxy_http_version 1.1;
# WebSocket upgrade (for /ws). $connection_upgrade comes from ws-upgrade.conf.
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

View File

@@ -0,0 +1,8 @@
# Maps the Connection header for WebSocket proxying. Goes in /etc/nginx/conf.d/ so it
# lives in the http{} context (a `map` can't go inside a server/location block).
# When a request carries `Upgrade: websocket`, send `Connection: upgrade`; otherwise
# `Connection: close`. Used by the voip.activeblue.net vhost for the /ws media stream.
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

43
deploy/setup-tls.sh Executable file
View File

@@ -0,0 +1,43 @@
#!/usr/bin/env bash
# One-shot TLS + nginx reverse-proxy setup for the AVC phone agent.
# RUN AS ROOT: sudo bash deploy/setup-tls.sh
#
# Prerequisites (must be true BEFORE running):
# - DNS: voip.activeblue.net -> your WAN IP (done: 66.23.239.222)
# - Router forwards external 80 AND 443 -> this box (10.10.1.221)
# - nginx running with its default :80 site (used to answer the ACME challenge)
#
# What it does: installs certbot, gets a Let's Encrypt cert via the webroot challenge
# (served by the existing default :80 site), installs the vhost + ws-upgrade map, then
# tests and reloads nginx. Idempotent-ish; safe to re-run.
set -euo pipefail
DOMAIN="voip.activeblue.net"
EMAIL="mr.garcia09@gmail.com"
APP_DIR="/home/tocmo0nlord/avc-phone"
WEBROOT="/var/www/html"
if [ "$(id -u)" -ne 0 ]; then echo "Run as root (sudo)."; exit 1; fi
echo "==> 1/4 install certbot"
if ! command -v certbot >/dev/null 2>&1; then
apt-get update && apt-get install -y certbot
fi
echo "==> 2/4 obtain certificate for $DOMAIN (webroot challenge)"
mkdir -p "$WEBROOT/.well-known/acme-challenge"
certbot certonly --webroot -w "$WEBROOT" -d "$DOMAIN" \
--non-interactive --agree-tos -m "$EMAIL" --keep-until-expiring
echo "==> 3/4 install nginx vhost + ws-upgrade map"
cp "$APP_DIR/deploy/nginx-ws-upgrade.conf" /etc/nginx/conf.d/ws-upgrade.conf
cp "$APP_DIR/deploy/nginx-voip.activeblue.net.conf" /etc/nginx/sites-available/voip.activeblue.net
ln -sf /etc/nginx/sites-available/voip.activeblue.net /etc/nginx/sites-enabled/voip.activeblue.net
echo "==> 4/4 test + reload nginx"
nginx -t
systemctl reload nginx
echo
echo "Done. Verify: curl https://$DOMAIN/health"
echo "Cert auto-renews via the certbot systemd timer; nginx reload on renew is handled by certbot's deploy hook."

101
extract.py Normal file
View File

@@ -0,0 +1,101 @@
"""Post-call appointment extraction.
Instead of unreliable in-call tool-calling (which made llama3.1:8b speak raw JSON),
we let the agent gather appointment details conversationally, then run ONE structured
extraction over the finished transcript and write it to Odoo. Reliable because it's a
single JSON-mode completion, not mid-conversation tool emission.
"""
import json
import re
import httpx
from loguru import logger
from practice import persist_appointment
_EXTRACT_INSTRUCTIONS = (
"You are reviewing a phone-call transcript between a caller and the receptionist "
"for an optometry practice. Extract any APPOINTMENT REQUEST the caller made.\n"
"Respond with ONLY a JSON object with these keys:\n"
' "wants_appointment": boolean — true only if the caller asked to book/schedule a visit\n'
' "patient_name": string or null\n'
' "callback_number": string or null (digits the caller gave to be called back)\n'
' "location": string or null (which office/city)\n'
' "reason": string or null (e.g. eye exam, broken glasses)\n'
' "preferred_time": string or null (day/time in the caller\'s words)\n'
"Use null for anything not clearly stated. Do not invent values."
)
async def extract_and_record(messages, ollama_url, model, call_sid=None, caller_number=None):
"""Extract an appointment from the transcript and persist it. Returns the record
dict if one was saved, else None."""
# Build a plain transcript from the conversation (skip the system prompt).
turns = [
f"{m['role']}: {m['content']}"
for m in messages
if m.get("role") in ("user", "assistant") and isinstance(m.get("content"), str) and m["content"].strip()
]
if not any(m.get("role") == "user" for m in messages):
return None # nobody said anything
transcript = "\n".join(turns)
base = ollama_url.rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
try:
async with httpx.AsyncClient(timeout=30) as client:
r = await client.post(
f"{base}/api/chat",
json={
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 0},
"messages": [
{"role": "system", "content": _EXTRACT_INSTRUCTIONS},
{"role": "user", "content": f"Transcript:\n{transcript}"},
],
},
)
r.raise_for_status()
data = json.loads(r.json()["message"]["content"])
except Exception:
logger.exception("Appointment extraction call failed")
return None
if not data.get("wants_appointment"):
logger.info("Post-call extraction: no appointment requested")
return None
# Don't create near-empty cards from quick hang-ups: require at least a name or a
# reason. A bare location + caller-ID isn't enough to be worth a worklist card.
name = (data.get("patient_name") or "").strip()
reason_raw = (data.get("reason") or "").strip()
if not name and not reason_raw:
logger.info("Post-call extraction: appointment intent but no name/reason captured — skipping card")
return None
# Prefer the verified Twilio caller-ID over a number pulled from the transcript —
# the model sometimes invents/echoes a phone number. Keep a genuinely different
# spoken number as a note for staff.
spoken = (data.get("callback_number") or "").strip()
callback = caller_number or spoken or None
reason = data.get("reason")
if spoken and caller_number and re.sub(r"\D", "", spoken) != re.sub(r"\D", "", caller_number):
reason = f"{reason or ''} (caller mentioned alternate number: {spoken})".strip()
record = {
"call_sid": call_sid,
"patient_name": data.get("patient_name"),
"callback_number": callback,
"location": data.get("location"),
"reason": reason,
"preferred_time": data.get("preferred_time"),
"source": "post_call_extraction",
}
where = persist_appointment(record)
logger.info(f"Post-call appointment saved ({where}): {record['patient_name']} / {record['location']}")
return record

106
odoo_client.py Normal file
View File

@@ -0,0 +1,106 @@
"""Minimal Odoo XML-RPC client for the phone agent.
Creates an appointment *request* in Odoo from a captured call. A request is NOT a
confirmed booking — staff call the patient back to finalize — so by default we write
a CRM lead (a clean "to-do" that doesn't occupy a real calendar slot). Set
ODOO_TARGET=calendar to instead drop a tentative event on the calendar.
Auth + target are all env-driven (see .env.example). Connection is lazy and every
failure is swallowed by the caller's fallback, so a flaky Odoo never drops a request.
"""
import os
import xmlrpc.client
from datetime import datetime, timedelta
from html import escape
ODOO_URL = os.environ.get("ODOO_URL", "http://localhost:8069")
ODOO_DB = os.environ.get("ODOO_DB", "db1")
ODOO_USER = os.environ.get("ODOO_USER", "")
ODOO_API_KEY = os.environ.get("ODOO_API_KEY", "")
ODOO_TARGET = os.environ.get("ODOO_TARGET", "crm").lower() # "crm" | "calendar"
# Pipeline placement for crm target. If ODOO_STAGE_ID is set, the request is created as a
# staged opportunity (shows up in the CRM pipeline as a worklist) instead of a bare lead.
ODOO_STAGE_ID = int(os.environ["ODOO_STAGE_ID"]) if os.environ.get("ODOO_STAGE_ID") else None
ODOO_TEAM_ID = int(os.environ["ODOO_TEAM_ID"]) if os.environ.get("ODOO_TEAM_ID") else None
ODOO_USER_ID = int(os.environ["ODOO_USER_ID"]) if os.environ.get("ODOO_USER_ID") else None
class OdooError(RuntimeError):
pass
def _connect():
if not (ODOO_USER and ODOO_API_KEY):
raise OdooError("ODOO_USER / ODOO_API_KEY not set")
common = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/common")
uid = common.authenticate(ODOO_DB, ODOO_USER, ODOO_API_KEY, {})
if not uid:
raise OdooError("Odoo authentication failed (check db/user/key)")
models = xmlrpc.client.ServerProxy(f"{ODOO_URL}/xmlrpc/2/object")
return uid, models
def _exec(uid, models, model, method, *args, **kw):
return models.execute_kw(ODOO_DB, uid, ODOO_API_KEY, model, method, list(args), kw)
def _find_or_create_partner(uid, models, name, phone):
"""Return a res.partner id, matching on phone first, else creating one."""
domain = []
if phone:
domain = ["|", ["phone", "=", phone], ["mobile", "=", phone]]
if domain:
hit = _exec(uid, models, "res.partner", "search", domain, limit=1)
if hit:
return hit[0]
vals = {"name": name or "Phone caller", "phone": phone or False, "company_type": "person"}
return _exec(uid, models, "res.partner", "create", vals)
def create_appointment_request(patient_name, callback_number, reason, preferred_time, call_sid=None):
"""Create the request in Odoo. Returns (model, record_id) or raises OdooError."""
uid, models = _connect()
summary = f"📞 Phone appt request — {patient_name or 'caller'}"
# description is an Odoo HTML field — build with <br/> so it renders in the UI.
rows = [
("Name", patient_name),
("Callback", callback_number),
("Reason", reason),
("Preferred time (patient's words)", preferred_time),
("Twilio call SID", call_sid),
]
note = "<p><b>Captured by the AVC phone agent</b> (UNCONFIRMED — call patient to finalize).</p><p>" + \
"<br/>".join(f"{escape(k)}: {escape(str(v)) if v else ''}" for k, v in rows) + "</p>"
if ODOO_TARGET == "calendar":
partner_id = _find_or_create_partner(uid, models, patient_name, callback_number)
# Tentative 30-min slot tomorrow 9:00 as a visible placeholder; real time set on callback.
start = (datetime.utcnow() + timedelta(days=1)).replace(hour=9, minute=0, second=0, microsecond=0)
vals = {
"name": summary,
"start": start.strftime("%Y-%m-%d %H:%M:%S"),
"stop": (start + timedelta(minutes=30)).strftime("%Y-%m-%d %H:%M:%S"),
"description": note,
"partner_ids": [(4, partner_id)],
}
rec = _exec(uid, models, "calendar.event", "create", vals)
return ("calendar.event", rec)
# CRM target. With a stage configured, create a staged opportunity (lands in the
# pipeline as a worklist staff act on); otherwise a plain lead.
vals = {
"name": summary,
"contact_name": patient_name or False,
"phone": callback_number or False,
"description": note,
"type": "opportunity" if ODOO_STAGE_ID else "lead",
}
if ODOO_STAGE_ID:
vals["stage_id"] = ODOO_STAGE_ID
if ODOO_TEAM_ID:
vals["team_id"] = ODOO_TEAM_ID
if ODOO_USER_ID:
vals["user_id"] = ODOO_USER_ID
rec = _exec(uid, models, "crm.lead", "create", vals)
return ("crm.lead", rec)

155
practice.py Normal file
View File

@@ -0,0 +1,155 @@
"""Advanced Vision Care practice facts + the phone agent's tools.
Facts sourced from advancedvisioncareflorida.com (8 locations across Broward,
Miami-Dade, Palm Beach). NOTE: the website does NOT publish office hours, so we do
NOT assert hours — the agent must offer to have staff confirm them instead of
inventing them. Fill HOURS in if/when you have them.
"""
import json
import os
import re
from datetime import datetime, timezone
from loguru import logger
# ─────────────────────────────────────────────────────────────────────────────
# Real facts from advancedvisioncareflorida.com
# ─────────────────────────────────────────────────────────────────────────────
LOCATIONS = [
# Broward County
{"city": "Hollywood / Fort Lauderdale", "address": "2873 Stirling Rd, Fort Lauderdale, FL 33312", "phone": "(954) 983-4969"},
{"city": "Tamarac", "address": "5865 N University Dr, Tamarac, FL 33321", "phone": "(954) 720-2720"},
{"city": "Pembroke Pines", "address": "246 S Flamingo Rd, Pembroke Pines, FL 33027", "phone": "(954) 443-1230"},
{"city": "Lauderdale Lakes", "address": "3682 W Oakland Park Blvd, Lauderdale Lakes, FL 33311", "phone": "(954) 730-8087"},
# Miami-Dade County
{"city": "Hialeah", "address": "1770 W 32nd Pl, Hialeah, FL 33012", "phone": "(305) 885-4477"},
{"city": "Kendall", "address": "11605 N Kendall Dr, Miami, FL 33176", "phone": "(305) 982-8927"},
{"city": "Miami Gardens", "address": "4771 NW 183rd St, Miami Gardens, FL 33055", "phone": "(305) 390-2467"},
# Palm Beach County
{"city": "Boca Raton", "address": "21673 State Road 7, Boca Raton, FL 33428", "phone": "(561) 470-2310"},
]
PRACTICE_FACTS = {
"name": "Advanced Vision Care",
"locations": LOCATIONS,
"insurance": [
"CarePlus", "Doctors Health", "Florida Blue Medicare", "Optum", "Spectera",
"Sunshine Health", "VSP", "WellCare",
],
"services": (
"routine and medical eye exams, contact lens exams, pediatric eye exams, "
"and LASIK consultations"
),
# Website does not publish hours — leave None so the agent won't invent them.
"hours": None,
}
REQUESTS_LOG = os.path.join(os.path.dirname(os.path.abspath(__file__)), "appointment_requests.jsonl")
# Expand street abbreviations so the TTS speaks "North Kendall Drive", not "N … D-R".
_ABBREV = {
"NW": "Northwest", "NE": "Northeast", "SW": "Southwest", "SE": "Southeast",
"N": "North", "S": "South", "E": "East", "W": "West",
"Dr": "Drive", "Rd": "Road", "Blvd": "Boulevard", "St": "Street",
"Ave": "Avenue", "Pl": "Place", "Ln": "Lane", "Ct": "Court", "Hwy": "Highway",
"FL": "Florida",
}
def _spoken_address(addr: str) -> str:
"""Expand directional + street-type abbreviations for natural speech."""
return re.sub(
r"\b(" + "|".join(re.escape(k) for k in _ABBREV) + r")\b",
lambda m: _ABBREV[m.group(1)],
addr,
)
def practice_summary() -> str:
"""Compact facts block for the system prompt."""
f = PRACTICE_FACTS
loc_lines = "\n".join(f" - {l['city']}: {_spoken_address(l['address'])}{l['phone']}" for l in f["locations"])
hours = f["hours"] or (
"NOT published — do not state specific hours; offer to have the office confirm."
)
return (
f"Practice name: {f['name']}\n"
f"Locations ({len(f['locations'])} offices across South Florida):\n{loc_lines}\n"
f"Insurance accepted (these EXACT plans only): {', '.join(f['insurance'])}.\n"
f"Services: {f['services']}\n"
f"Hours: {hours}\n"
)
def _find_location(name: str):
"""Loose match a caller's city/location text to a known office."""
if not name:
return None
n = name.lower()
for l in LOCATIONS:
if n in l["city"].lower() or l["city"].lower() in n:
return l
return None
# ─── Tools (used when ENABLE_TOOLS=true and the model supports tool-calling) ──
def persist_appointment(record: dict) -> str:
"""Write an appointment request to Odoo (a crm.lead) if configured, else to the
JSONL fallback so a request is never lost. Returns where it landed. Used by both
the post-call extraction and the (optional) in-call tool."""
record.setdefault("ts", datetime.now(timezone.utc).isoformat())
if os.environ.get("ODOO_USER") and os.environ.get("ODOO_API_KEY"):
try:
from odoo_client import create_appointment_request
model, rec_id = create_appointment_request(
patient_name=record.get("patient_name"),
callback_number=record.get("callback_number"),
reason=f"[{record.get('location') or 'location TBD'}] {record.get('reason') or ''}".strip(),
preferred_time=record.get("preferred_time"),
call_sid=record.get("call_sid"),
)
logger.info(f"Appointment -> Odoo {model} id={rec_id}: {record.get('patient_name')}")
return f"odoo:{model}:{rec_id}"
except Exception as e:
logger.warning(f"Odoo write failed ({e!r}); falling back to local log")
record["odoo_error"] = repr(e)
with open(REQUESTS_LOG, "a") as fh:
fh.write(json.dumps(record) + "\n")
logger.info(f"Appointment -> JSONL: {record.get('patient_name')}")
return "jsonl"
async def record_appointment_request(params):
"""In-call tool path (only used when ENABLE_TOOLS=true). Wraps persist_appointment."""
args = params.arguments or {}
persist_appointment({
"call_sid": getattr(params, "call_sid", None),
"patient_name": args.get("patient_name"),
"callback_number": args.get("callback_number"),
"location": args.get("location"),
"reason": args.get("reason"),
"preferred_time": args.get("preferred_time"),
"source": "in_call_tool",
})
await params.result_callback(
{"status": "captured", "message": "Got it — our staff will call you back to confirm the time."}
)
async def get_practice_info(params):
"""Return practice facts (optionally narrowed to one location) for accurate answers."""
args = params.arguments or {}
loc = _find_location(args.get("location", ""))
result = {
"name": PRACTICE_FACTS["name"],
"insurance": PRACTICE_FACTS["insurance"],
"services": PRACTICE_FACTS["services"],
"hours": "not published — offer to have the office confirm",
}
result["location"] = loc if loc else PRACTICE_FACTS["locations"]
await params.result_callback(result)

19
run.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
# Launch the AVC phone agent. Reuses the pipecat-run venv (has pipecat-ai 1.3.0,
# faster-whisper, kokoro-onnx, fastapi) and the CUDA12 libs we installed into it.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
VENV="/home/tocmo0nlord/pipecat-run/.venv"
# faster-whisper (ctranslate2) needs the CUDA12 cublas + cudnn shared libs at runtime.
NV="$VENV/lib/python3.13/site-packages/nvidia"
export LD_LIBRARY_PATH="$NV/cublas/lib:$NV/cudnn/lib:${LD_LIBRARY_PATH:-}"
# Load .env if present (Twilio creds, PUBLIC_HOST, model overrides).
if [ -f "$HERE/.env" ]; then
set -a; . "$HERE/.env"; set +a
fi
cd "$HERE"
exec "$VENV/bin/python" server.py

20
run_web.sh Executable file
View File

@@ -0,0 +1,20 @@
#!/usr/bin/env bash
# Launch the browser (WebRTC) version of Sofia for fast iteration — same brain as the
# phone agent, served at http://localhost:7860. Mirrors run.sh (venv + CUDA libs + .env).
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
VENV="/home/tocmo0nlord/pipecat-run/.venv"
# faster-whisper (ctranslate2) needs the CUDA12 cublas + cudnn shared libs at runtime.
NV="$VENV/lib/python3.13/site-packages/nvidia"
export LD_LIBRARY_PATH="$NV/cublas/lib:$NV/cudnn/lib:${LD_LIBRARY_PATH:-}"
# Load .env (model, voice, VAD, etc.) — same config the phone agent uses.
if [ -f "$HERE/.env" ]; then
set -a; . "$HERE/.env"; set +a
fi
cd "$HERE"
echo "Sofia (browser) starting — open http://localhost:7860 once it's up."
exec "$VENV/bin/python" bot_web.py

218
server.py Normal file
View File

@@ -0,0 +1,218 @@
#!/usr/bin/env python3
"""Twilio-facing web server for the AVC phone agent.
Two endpoints, both reached by Twilio over your public Traefik domain:
POST /voice -> returns TwiML telling Twilio to open a bidirectional Media Stream
back to wss://<PUBLIC_HOST>/ws?token=<STREAM_TOKEN>
WS /ws -> Twilio's Media Stream. We check the stream token, read the opening
'start' event for the SIDs, then hand the socket to the pipeline.
Security:
- POST /voice is authenticated with Twilio's X-Twilio-Signature (HMAC-SHA1 over the
public URL + sorted POST params, keyed by the API Key Secret). Enforced whenever
TWILIO_API_KEY_SECRET is set; set TWILIO_VALIDATE=false to bypass for local testing.
- WS /ws can't carry an X-Twilio-Signature usefully, so we gate it with a shared
STREAM_TOKEN embedded in the wss URL we hand Twilio in the TwiML.
Inbound only. Run behind Traefik (TLS terminated there); this app listens plain HTTP
on $PORT. See README for the Twilio number + Traefik wiring.
"""
import asyncio
import base64
import hashlib
import hmac
import json
import os
import secrets
from fastapi import FastAPI, Request, WebSocket
from fastapi.responses import HTMLResponse
from loguru import logger
from bot import run_call
from pipecat.serializers.twilio import TwilioFrameSerializer
# Public hostname Twilio dials back into (your Traefik domain), e.g. phone.example.com
PUBLIC_HOST = os.environ.get("PUBLIC_HOST", "CHANGE-ME.example.com")
PORT = int(os.environ.get("PORT", "8200"))
# Bind localhost by default: nginx terminates TLS and proxies in from 127.0.0.1, so the
# app needn't be exposed on the LAN. Set BIND_HOST=0.0.0.0 only if a remote proxy needs it.
BIND_HOST = os.environ.get("BIND_HOST", "127.0.0.1")
# Twilio REST creds — let the serializer auto-hang-up the carrier leg on EndFrame,
# and validate inbound webhook signatures.
TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
# Standard API Key (scoped to this app, revocable independently) instead of the account
# master Auth Token. The Secret is used both for HMAC webhook-signature validation and as
# the serializer credential for auto-hang-up.
TWILIO_API_KEY_SID = os.environ.get("TWILIO_API_KEY_SID")
TWILIO_API_KEY_SECRET = os.environ.get("TWILIO_API_KEY_SECRET")
# Signature validation is ON by default when the API key secret exists; explicit opt-out.
TWILIO_VALIDATE = os.environ.get("TWILIO_VALIDATE", "true").lower() not in ("false", "0", "no")
# Shared secret embedded in the Media Stream wss URL to gate /ws. Auto-generated if
# unset (fine for a single process), but set it in .env for stability across restarts.
STREAM_TOKEN = os.environ.get("STREAM_TOKEN") or secrets.token_urlsafe(24)
# Max simultaneous live calls. Each call holds an Ollama context on the 16GB GPU and
# Ollama serializes generation, so cap this to protect call quality.
# Over-cap callers hear BUSY_MESSAGE and are hung up — existing calls are never degraded.
MAX_CONCURRENT_CALLS = int(os.environ.get("MAX_CONCURRENT_CALLS", "2"))
BUSY_MESSAGE = os.environ.get(
"BUSY_MESSAGE",
"Thank you for calling Advanced Vision Care. All of our lines are busy right now. "
"Please call back in a few minutes. Goodbye.",
)
app = FastAPI()
# Live count of active /ws pipelines (the real GPU consumers), guarded by a lock.
_active_calls = 0
_active_lock = asyncio.Lock()
async def _reserve_call_slot() -> bool:
"""Atomically take a call slot. Returns False if at capacity."""
global _active_calls
async with _active_lock:
if _active_calls >= MAX_CONCURRENT_CALLS:
return False
_active_calls += 1
return True
async def _release_call_slot():
global _active_calls
async with _active_lock:
_active_calls = max(0, _active_calls - 1)
def _twilio_signature_ok(url: str, params: dict, header_sig: str) -> bool:
"""Recompute Twilio's request signature and compare in constant time.
Algorithm (Twilio docs): take the full public URL, append each POST param as
key+value sorted by key, HMAC-SHA1 with the API Key Secret, base64-encode.
"""
if not (TWILIO_API_KEY_SECRET and header_sig):
return False
payload = url + "".join(f"{k}{params[k]}" for k in sorted(params))
digest = hmac.new(TWILIO_API_KEY_SECRET.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
expected = base64.b64encode(digest).decode()
return hmac.compare_digest(expected, header_sig)
@app.get("/health")
async def health():
return {
"status": "ok",
"public_host": PUBLIC_HOST,
"validate": TWILIO_VALIDATE and bool(TWILIO_API_KEY_SECRET),
"active_calls": _active_calls,
"max_calls": MAX_CONCURRENT_CALLS,
}
@app.post("/voice")
async def voice(request: Request):
"""TwiML: connect the call to our Media Stream WebSocket (bidirectional)."""
form = dict(await request.form())
if TWILIO_VALIDATE and TWILIO_API_KEY_SECRET:
# Validate against the PUBLIC url Twilio actually signed, not the internal one.
public_url = f"https://{PUBLIC_HOST}/voice"
sig = request.headers.get("X-Twilio-Signature", "")
if not _twilio_signature_ok(public_url, form, sig):
logger.warning("Rejected /voice: bad or missing X-Twilio-Signature")
return HTMLResponse(status_code=403, content="forbidden")
elif not TWILIO_API_KEY_SECRET:
logger.warning("/voice signature validation DISABLED (no TWILIO_API_KEY_SECRET set)")
caller = form.get("From", "") # caller-ID; passed through for appointment callback
# Capacity gate: if all slots are busy, speak the busy message and hang up here —
# before any GPU work — so in-progress calls are never degraded. (A reservation is
# taken at /ws, tied to the socket lifecycle; this is the live read of that count.)
if _active_calls >= MAX_CONCURRENT_CALLS:
logger.info(f"At capacity ({_active_calls}/{MAX_CONCURRENT_CALLS}) — returning busy")
busy = f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>{BUSY_MESSAGE}</Say>
<Hangup/>
</Response>"""
return HTMLResponse(content=busy, media_type="application/xml")
# NOTE: <Connect><Stream> is bidirectional (agent can speak back). <Start><Stream>
# would be one-way (listen only) — do not use that here.
# Token passed as a <Parameter> (Twilio does NOT preserve a query string on the
# wss URL); it arrives in the /ws 'start' message's customParameters.
twiml = f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://{PUBLIC_HOST}/ws">
<Parameter name="token" value="{STREAM_TOKEN}" />
<Parameter name="caller" value="{caller}" />
</Stream>
</Connect>
</Response>"""
return HTMLResponse(content=twiml, media_type="application/xml")
@app.websocket("/ws")
async def media_stream(websocket: WebSocket):
# The stream token rides in the TwiML <Parameter>, which Twilio delivers inside the
# 'start' message's customParameters — so we must accept the socket to read it, then
# validate + capacity-gate before doing any real work.
await websocket.accept()
call_sid = None
reserved = False
try:
# Twilio sends a 'connected' frame, then a 'start' frame with SIDs + params.
msgs = websocket.iter_text()
await msgs.__anext__() # 'connected'
start = json.loads(await msgs.__anext__()) # 'start'
start_data = start["start"]
token = (start_data.get("customParameters") or {}).get("token")
if token != STREAM_TOKEN:
logger.warning("Rejected /ws: bad or missing stream token")
await websocket.close(code=1008) # policy violation
return
# Capacity gate (hard safety net for the /voice→/ws race).
if not await _reserve_call_slot():
logger.warning(f"/ws over capacity ({_active_calls}/{MAX_CONCURRENT_CALLS}) — closing")
await websocket.close(code=1013) # try again later
return
reserved = True
stream_sid = start_data["streamSid"]
call_sid = start_data["callSid"]
caller_number = (start_data.get("customParameters") or {}).get("caller") or None
logger.info(
f"Media stream start: call={call_sid} stream={stream_sid} caller={caller_number} "
f"({_active_calls}/{MAX_CONCURRENT_CALLS} slots in use)"
)
serializer = TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=TWILIO_ACCOUNT_SID,
auth_token=TWILIO_API_KEY_SECRET,
)
await run_call(websocket, serializer, caller_number=caller_number, call_sid=call_sid)
except Exception:
logger.exception("Call pipeline error")
finally:
if reserved:
await _release_call_slot()
logger.info(f"Call ended: {call_sid} ({_active_calls}/{MAX_CONCURRENT_CALLS} slots in use)")
if __name__ == "__main__":
import uvicorn
logger.info(f"AVC phone agent on {BIND_HOST}:{PORT} | public={PUBLIC_HOST} | "
f"sig_validation={'on' if (TWILIO_VALIDATE and TWILIO_API_KEY_SECRET) else 'OFF'}")
uvicorn.run(app, host=BIND_HOST, port=PORT)

71
traefik-avc-phone.yml Normal file
View File

@@ -0,0 +1,71 @@
# Traefik DYNAMIC configuration for the AVC phone agent.
# File-provider snippet — Traefik does NOT run on the GPU box (miaai), so this routes
# your existing Traefik host to the phone service over the network.
#
# ── Install ──────────────────────────────────────────────────────────────────
# Drop this into the directory your Traefik watches as a file provider, e.g.
# traefik.yml (static):
# providers:
# file:
# directory: /etc/traefik/dynamic
# watch: true
# then: cp traefik-avc-phone.yml /etc/traefik/dynamic/ (Traefik hot-reloads it)
#
# ── BEFORE IT WORKS, set these 4 things to match YOUR Traefik ────────────────
# 1) HOST -> the real domain (must equal PUBLIC_HOST in the app's .env,
# because Twilio's signature is computed over https://<HOST>/voice)
# 2) entryPoints -> your HTTPS entrypoint name (commonly `websecure`; could be `https`)
# 3) certResolver -> your ACME resolver name (commonly `le`/`letsencrypt`/`myresolver`)
# 4) the service url -> reachable address of the GPU box from the Traefik host
# (LAN 10.10.1.221:8200; swap for the NetBird overlay IP if that's
# the path Traefik uses to reach miaai)
#
# WebSockets (the /ws Twilio Media Stream): Traefik forwards the Upgrade handshake
# automatically — no special middleware needed. One router/service covers both the
# /voice HTTPS POST and the /ws WSS upgrade because they share host + backend.
http:
routers:
avc-phone:
rule: "Host(`phone.example.com`)" # 1) <-- your domain
entryPoints:
- websecure # 2) <-- your HTTPS entrypoint
service: avc-phone
tls:
certResolver: le # 3) <-- your ACME resolver
# Optional: redirect plain HTTP -> HTTPS for this host. Omit if your Traefik does
# this globally already.
avc-phone-http:
rule: "Host(`phone.example.com`)" # 1) <-- your domain
entryPoints:
- web # your HTTP (:80) entrypoint name
middlewares:
- avc-phone-https-redirect
service: avc-phone # never reached (redirected first)
services:
avc-phone:
loadBalancer:
passHostHeader: true
servers:
- url: "http://10.10.1.221:8200" # 4) <-- GPU box (miaai) : app port
# Twilio media streams are long-lived; keep generous timeouts on the
# entrypoint (static config) — see note at bottom.
middlewares:
avc-phone-https-redirect:
redirectScheme:
scheme: https
permanent: true
# ── One static-config reminder (NOT part of this file) ───────────────────────
# Twilio Media Streams hold the WebSocket open for the whole call. If your HTTPS
# entrypoint has a short respondingTimeout/idleTimeout, long calls get cut. In the
# STATIC traefik config, ensure the entrypoint allows long-lived streams, e.g.:
# entryPoints:
# websecure:
# address: ":443"
# transport:
# respondingTimeouts:
# idleTimeout: 3600s