Reframe Change 1/2/3 to record the actual decisions instead of the trialed swaps: Deepgram and the Twilio Standard API Key were both evaluated and reverted. Document why the API Key cannot replace the Auth Token (Twilio signs webhooks with the Auth Token). Update the .env reference, Phase 1 checklist, dependencies, and open items accordingly; gate zombie-check uses ps/pgrep (bare process, not Docker). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
594 lines
20 KiB
Markdown
594 lines
20 KiB
Markdown
# AVC Phone Agent — Project Specification
|
|
> Claude Code authoritative reference. All architecture, security, and build decisions live here.
|
|
> Repo: `git.activeblue.net/tocmo0nlord/avc-phone-ai`
|
|
> Last updated: 2026-06-25 | Active Blue LLC
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
**Name:** AVC Phone Agent
|
|
**Owner:** Active Blue LLC
|
|
**Client:** Advanced Vision Care (AVC) — multi-location ophthalmology/optometry practice (FL + TX)
|
|
**Agent name:** AVA (Advanced Vision Assistant)
|
|
**Purpose:** Automated AI phone agent that answers patient calls, books tentative appointments
|
|
into Odoo CRM with call recordings and transcripts attached, and self-improves via
|
|
Claude-powered transcript monitoring and a fine-tuning feedback loop.
|
|
|
|
---
|
|
|
|
## Existing Codebase — What to Keep, What to Change
|
|
|
|
The previous build at `/home/tocmo0nlord/avc-phone/` is a working foundation.
|
|
**Do not rewrite what works.** Apply only the changes documented in this section.
|
|
|
|
### Files and their status
|
|
|
|
| File | Status | Action |
|
|
|------|--------|--------|
|
|
| `bot.py` | Keep as-is | Whisper STT retained (real-time). Deepgram evaluated and rejected — see Change 1 |
|
|
| `server.py` | Keep as-is | Twilio Auth Token retained. API Key swap evaluated and rejected — see Change 2 |
|
|
| `practice.py` | Keep as-is | No changes |
|
|
| `extract.py` | Keep as-is | No changes |
|
|
| `odoo_client.py` | Keep as-is | Already uses API key auth correctly |
|
|
|
|
### What is already solved — do not touch
|
|
|
|
**`EndCallProcessor` in `bot.py`** — AVC-side call termination is fully implemented.
|
|
Watches LLM text stream for closing keywords ("goodbye"), waits for TTS to finish via
|
|
`BotStoppedSpeakingFrame`, then pushes `EndTaskFrame` upstream. `TwilioFrameSerializer`
|
|
with `auto_hang_up` drops the carrier leg. This is correct. Zero changes.
|
|
|
|
**Mulaw 8kHz ↔ 16kHz conversion** — handled internally by `TwilioFrameSerializer`.
|
|
`PIPELINE_SAMPLE_RATE = 16000`, `WIRE_SAMPLE_RATE = 8000` are already set correctly.
|
|
No custom audio module needed.
|
|
|
|
**VAD tuned for telephony** — `confidence=0.5`, `min_volume=0.3` already loosened from
|
|
desktop defaults. These settings directly address the repeat-yourself problem on the
|
|
VAD side.
|
|
|
|
**Capacity gating** — `MAX_CONCURRENT_CALLS=2` with atomic slot reservation in
|
|
`server.py` prevents GPU thrashing. Keep it.
|
|
|
|
**`AudioHeartbeat`** — diagnostic processor that distinguishes VAD failure from
|
|
transport stall. Keep it.
|
|
|
|
**Post-call extraction (`extract.py`)** — single JSON-mode completion after call ends.
|
|
Correctly uses `format: json`, uses verified Twilio caller-ID instead of trusting model
|
|
output, falls back to JSONL if Odoo is unreachable. Keep it.
|
|
|
|
**Odoo integration (`odoo_client.py`)** — already uses `ODOO_API_KEY` for XML-RPC auth,
|
|
not password. Correct pattern. No changes.
|
|
|
|
---
|
|
|
|
## Change 1 — Real-time STT stays on Whisper (`bot.py`)
|
|
|
|
**Decision (2026-06-25): keep Whisper. Deepgram Nova-2 was evaluated and rejected.**
|
|
|
|
Deepgram Nova-2 was trialed to cut STT latency (Whisper buffers ~1-3s before the LLM
|
|
sees input). The swap was applied and then reverted — the project stays on local
|
|
faster-whisper. No external STT dependency, no per-minute STT cost, and no audio
|
|
leaving the box (HIPAA posture). Latency is instead managed via VAD tuning and the
|
|
`medium` model on the RTX 5080.
|
|
|
|
**Current `bot.py` STT (in place — do not change):**
|
|
```python
|
|
from pipecat.services.whisper.stt import WhisperSTTService
|
|
|
|
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "medium") # tiny|base|small|medium
|
|
WHISPER_DEVICE = os.environ.get("WHISPER_DEVICE", "cuda") # cuda for the 5080
|
|
WHISPER_COMPUTE = os.environ.get("WHISPER_COMPUTE", "float16")
|
|
WHISPER_HOTWORDS = os.environ.get("WHISPER_HOTWORDS", "...") # domain vocab bias
|
|
|
|
# HintedWhisperSTTService wraps WhisperSTTService to inject faster-whisper `hotwords`
|
|
# (office cities + optometry terms) per call. Instantiated in run_agent():
|
|
stt = HintedWhisperSTTService(
|
|
settings=WhisperSTTService.Settings(model=WHISPER_MODEL),
|
|
device=WHISPER_DEVICE,
|
|
compute_type=WHISPER_COMPUTE,
|
|
hotwords=WHISPER_HOTWORDS,
|
|
)
|
|
```
|
|
|
|
**Note:** Whisper large-v3 also serves post-call transcription in Phase 3
|
|
(`recording/transcriber.py`). If real-time latency proves unacceptable in the Phase 1
|
|
gate, revisit a streaming STT then — but do not reintroduce the dependency speculatively.
|
|
|
|
---
|
|
|
|
## Change 2 — Twilio webhook auth stays on the Auth Token (`server.py`)
|
|
|
|
**Decision (2026-06-25): keep `TWILIO_AUTH_TOKEN`. The API Key swap was evaluated and rejected.**
|
|
|
|
A Standard API Key (scoped, revocable) was trialed in place of the account Auth Token,
|
|
but it **cannot do what this server needs**: Twilio signs inbound webhooks
|
|
(`X-Twilio-Signature`) with the account **Auth Token** — an API Key Secret cannot validate
|
|
that signature, so `TWILIO_VALIDATE=true` would reject every legitimate `POST /voice`
|
|
(403). The `TwilioFrameSerializer` auto-hang-up also expects the account/Auth-Token
|
|
credential pair. The swap was reverted.
|
|
|
|
**Credential model (in place):**
|
|
```
|
|
Twilio Account SID (not secret on its own)
|
|
└── Auth Token (TWILIO_AUTH_TOKEN — validates webhooks + REST/auto-hang-up)
|
|
```
|
|
|
|
Treat the Auth Token as a password: keep it only in `.env` (never committed), rotate on
|
|
any suspected leak / team departure / quarterly. If finer-grained scoping is ever
|
|
required, the correct design is a *hybrid* — Auth Token for `X-Twilio-Signature`
|
|
validation, an API Key (SK SID + Secret) only for outbound REST — not a wholesale swap.
|
|
|
|
**Current `server.py` (in place — do not change):**
|
|
|
|
```python
|
|
TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID")
|
|
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN")
|
|
|
|
# _twilio_signature_ok(): HMAC-SHA1 keyed by the Auth Token (what Twilio signs with)
|
|
digest = hmac.new(TWILIO_AUTH_TOKEN.encode(), payload.encode("utf-8"), hashlib.sha1).digest()
|
|
|
|
# Validation gate + warning
|
|
if TWILIO_VALIDATE and TWILIO_AUTH_TOKEN:
|
|
...
|
|
elif not TWILIO_AUTH_TOKEN:
|
|
logger.warning("/voice signature validation DISABLED (no TWILIO_AUTH_TOKEN set)")
|
|
|
|
# Serializer auto-hang-up uses the account SID + Auth Token pair
|
|
serializer = TwilioFrameSerializer(
|
|
stream_sid=stream_sid,
|
|
call_sid=call_sid,
|
|
account_sid=TWILIO_ACCOUNT_SID,
|
|
auth_token=TWILIO_AUTH_TOKEN,
|
|
)
|
|
```
|
|
|
|
**Auth Token rotation procedure:**
|
|
1. Generate a new primary Auth Token in the Twilio console (use the secondary-token flow)
|
|
2. Update `TWILIO_AUTH_TOKEN` in `.env`
|
|
3. Restart the service — no rebuild needed
|
|
4. Verify one test call succeeds (signature validation + auto-hang-up both rely on it)
|
|
5. Retire the old token in the Twilio console
|
|
|
|
Rotate on: any suspected leak, any team member departure, quarterly as routine.
|
|
|
|
---
|
|
|
|
## Change 3 — `.env`
|
|
|
|
No swap. `.env` keeps `TWILIO_AUTH_TOKEN` and the Whisper STT vars; there is **no**
|
|
`TWILIO_API_KEY_*` or `DEEPGRAM_*` (those were trialed and removed with Changes 1/2).
|
|
|
|
**Full `.env` reference:**
|
|
```env
|
|
# Twilio — Auth Token validates webhooks + drives auto-hang-up. Never committed.
|
|
TWILIO_ACCOUNT_SID=AC...
|
|
TWILIO_AUTH_TOKEN=
|
|
TWILIO_PHONE_NUMBER=+1...
|
|
TWILIO_VALIDATE=true
|
|
|
|
# STT: Whisper (faster-whisper, real-time in-call; large-v3 also used post-call in Phase 3)
|
|
WHISPER_MODEL=medium
|
|
WHISPER_DEVICE=cuda
|
|
WHISPER_COMPUTE=float16
|
|
|
|
# LLM: Ollama
|
|
OLLAMA_URL=http://127.0.0.1:11434/v1
|
|
OLLAMA_MODEL=activeblue-avc:latest
|
|
LLM_PROVIDER=ollama
|
|
LLM_TEMPERATURE=0.3
|
|
LLM_MAX_TOKENS=160
|
|
|
|
# Anthropic (optional LLM swap + monitoring + synthetic data)
|
|
ANTHROPIC_API_KEY=
|
|
ANTHROPIC_MODEL=claude-sonnet-4-6
|
|
|
|
# TTS: Kokoro
|
|
KOKORO_VOICE=af_heart
|
|
KOKORO_MODEL_DIR=/home/tocmo0nlord/pipecat-run/models
|
|
|
|
# Odoo
|
|
ODOO_URL=https://avc.activeblue.net
|
|
ODOO_DB=avc
|
|
ODOO_USER=
|
|
ODOO_API_KEY=
|
|
ODOO_TARGET=crm
|
|
ODOO_STAGE_ID=
|
|
ODOO_TEAM_ID=
|
|
ODOO_USER_ID=
|
|
|
|
# Server
|
|
PUBLIC_HOST=avc-phone.activeblue.net
|
|
PORT=8200
|
|
BIND_HOST=127.0.0.1
|
|
MAX_CONCURRENT_CALLS=2
|
|
STREAM_TOKEN=
|
|
|
|
# Call behaviour
|
|
AGENT_NAME=AVA
|
|
ENABLE_TOOLS=
|
|
VAD_CONFIDENCE=0.5
|
|
VAD_MIN_VOLUME=0.3
|
|
VAD_START_SECS=0.2
|
|
VAD_STOP_SECS=0.5
|
|
|
|
# Monitoring (Phase 4)
|
|
MONITORING_ENABLED=true
|
|
MONITORING_SCHEDULE=0 2 * * *
|
|
|
|
# A/B model routing (Phase 5 only)
|
|
AB_SPLIT_PERCENT=0
|
|
AB_MODEL_B=
|
|
```
|
|
|
|
---
|
|
|
|
## Model Configuration
|
|
|
|
### Current production model: `activeblue-avc:latest`
|
|
|
|
| Property | Value | Notes |
|
|
|----------|-------|-------|
|
|
| Base | `llama3.1:8b-instruct-q4_K_M` | Llama 3.1 8B, Q4_K_M quantization |
|
|
| ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 |
|
|
| Size | 4.9GB | Down from 8.7GB Q8_0 |
|
|
| VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 |
|
|
| Context | 4096 tokens | Sufficient for any phone call |
|
|
| Temperature | 0.3 | Low — maximizes JSON schema compliance |
|
|
| Top-p | 0.9 | Standard |
|
|
| Adapter | None | 44-pair LoRA adapter discarded |
|
|
|
|
### Modelfile (rebuild reference)
|
|
|
|
```
|
|
FROM llama3.1:8b-instruct-q4_K_M
|
|
|
|
PARAMETER stop "<|start_header_id|>"
|
|
PARAMETER stop "<|end_header_id|>"
|
|
PARAMETER stop "<|eot_id|>"
|
|
PARAMETER num_ctx 4096
|
|
PARAMETER temperature 0.3
|
|
PARAMETER top_p 0.9
|
|
|
|
TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
|
|
{{ .Content }}<|eot_id|>
|
|
{{- end }}<|start_header_id|>assistant<|end_header_id|>
|
|
"
|
|
```
|
|
|
|
### Why Q4_K_M not Q8_0
|
|
|
|
Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused
|
|
inference latency spikes. Q4_K_M cuts weight VRAM to ~4.5GB with negligible quality
|
|
difference at 8B scale.
|
|
|
|
### Why no adapter
|
|
|
|
44-pair LoRA adapter was adding noise not signal. Minimum viable dataset is 200+ pairs
|
|
per intent category. Rebuilt correctly in Phase 5 with 500+ pairs in JSON output format.
|
|
|
|
### Ollama inventory (current)
|
|
|
|
```
|
|
activeblue-avc:latest 366a6cc15bb7 4.9GB production
|
|
llama3.1:8b-instruct-q4_K_M 46e0c10c039e 4.9GB base
|
|
nomic-embed-text:latest 0a109f422b47 274MB embeddings
|
|
```
|
|
|
|
### Phase 5 training note
|
|
|
|
Axolotl pulls from HuggingFace in safetensors format, not Ollama GGUF:
|
|
```bash
|
|
# Phase 5 only — do not run now
|
|
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct
|
|
# ~16GB on disk, separate from Ollama storage
|
|
```
|
|
|
|
---
|
|
|
|
## Build Phases
|
|
|
|
Claude Code must not scaffold Phase N+1 until Phase N gate is marked complete.
|
|
|
|
### Phase 1 — Reliable call loop
|
|
|
|
**Goal:** Every utterance gets a response. Zero silent failures. AVC hangs up — not
|
|
the caller.
|
|
|
|
- [x] Change 1: STT — Deepgram evaluated, reverted; staying on Whisper (`medium`)
|
|
- [x] Change 2: Twilio auth — API Key evaluated, reverted; staying on Auth Token
|
|
- [x] Change 3: `.env` — Auth Token + Whisper vars; `OLLAMA_MODEL=activeblue-avc:latest`
|
|
- [ ] Verify `EndCallProcessor` termination in Twilio call logs (AVC side, not caller)
|
|
- [ ] Verify `AudioHeartbeat` diagnostic logging active
|
|
- [ ] Verify `MAX_CONCURRENT_CALLS` capacity gating works
|
|
|
|
**Gate — all five must pass:**
|
|
1. 10 consecutive test calls — zero silent non-responses
|
|
2. Zero zombie pipeline instances after call ends (`ps`/`pgrep` — service runs as a bare
|
|
systemd/host process, not Docker)
|
|
3. Call termination from AVC side confirmed in Twilio call logs
|
|
4. JSON parse failure rate visible in logs — measurable not invisible
|
|
5. Response latency P95 under 3 seconds from STT end-of-utterance to first TTS audio
|
|
|
|
### Phase 2 — Accuracy (RAG + validation)
|
|
|
|
- [ ] Populate `rag/data/*.jsonl` with real AVC data (human task — see RAG section)
|
|
- [ ] ChromaDB RAG retriever wired into pipeline
|
|
- [ ] Response validator: JSON schema + factual cross-check + PHI leak scan
|
|
- [ ] Keyword blocklist (uncertainty phrases → handoff)
|
|
- [ ] Intent classifier routing
|
|
- [ ] Turn counter: max 3 failed turns before forced handoff + termination
|
|
|
|
**Gate:** 20 manual test calls, zero hallucinations on AVC-specific facts
|
|
|
|
### Phase 3 — Booking
|
|
|
|
- [ ] Real-time calendar availability check (`odoo/calendar.py`)
|
|
- [ ] Whisper large-v3 post-call transcription (`recording/transcriber.py`)
|
|
- [ ] Recording + transcript attached to Odoo lead chatter
|
|
- [ ] Staff review flow confirmed in Odoo
|
|
|
|
**Gate:** Staff receives, reviews, and confirms a lead end-to-end
|
|
|
|
### Phase 4 — Monitoring
|
|
|
|
- [ ] Transcript index (`recordings/index.jsonl`)
|
|
- [ ] Claude monitoring job
|
|
- [ ] Dashboard: toggle, alert queue, one-click apply, playback, quality tagging
|
|
|
|
**Gate:** First monitoring run produces actionable suggestions
|
|
|
|
### Phase 5 — Fine-tuning
|
|
|
|
- [ ] Pull HuggingFace base (see model section)
|
|
- [ ] Synthetic data generation via Claude API in JSON output format
|
|
- [ ] Real call exporter using staff quality tags
|
|
- [ ] Axolotl QLoRA on RTX 5080
|
|
- [ ] Model registry + versioning + A/B routing
|
|
|
|
**Gate:** New model outperforms baseline over 50+ calls
|
|
|
|
---
|
|
|
|
## Repository Structure
|
|
|
|
```
|
|
avc-phone-ai/
|
|
├── CLAUDE.md ← this file
|
|
├── README.md
|
|
├── .env ← never committed
|
|
├── .env.example
|
|
├── .gitignore ← includes .env, recordings/, *.gguf
|
|
│
|
|
├── bot.py ← Pipecat pipeline (Phase 1 changes here)
|
|
├── server.py ← Twilio webhook server (Phase 1 changes here)
|
|
├── practice.py ← AVC facts + Odoo persistence
|
|
├── extract.py ← post-call appointment extraction
|
|
├── odoo_client.py ← Odoo XML-RPC client
|
|
│
|
|
├── rag/ ← Phase 2
|
|
│ ├── store.py
|
|
│ ├── loader.py
|
|
│ ├── retriever.py
|
|
│ └── data/
|
|
│ ├── avc_locations.jsonl
|
|
│ ├── avc_providers.jsonl
|
|
│ ├── avc_services.jsonl
|
|
│ ├── avc_hours.jsonl
|
|
│ ├── avc_insurance.jsonl
|
|
│ └── avc_faqs.jsonl
|
|
│
|
|
├── recording/ ← Phase 3
|
|
│ ├── transcriber.py ← Whisper large-v3 post-call only
|
|
│ └── storage.py
|
|
│
|
|
├── monitoring/ ← Phase 4
|
|
│ ├── monitor.py
|
|
│ ├── analyzer.py
|
|
│ ├── diff_engine.py
|
|
│ ├── scheduler.py
|
|
│ └── dashboard/
|
|
│ ├── app.py
|
|
│ └── static/
|
|
│
|
|
├── training/ ← Phase 5 stub
|
|
│ └── README.md
|
|
│
|
|
├── tests/
|
|
│ ├── test_bot.py
|
|
│ ├── test_server.py
|
|
│ ├── test_odoo_client.py
|
|
│ ├── test_extract.py
|
|
│ └── fixtures/
|
|
│ └── sample_transcripts.jsonl
|
|
│
|
|
├── scripts/
|
|
│ ├── deploy.sh
|
|
│ └── smoke_test.sh
|
|
│
|
|
├── avc-phone.service ← existing systemd unit
|
|
└── traefik-avc-phone.yml ← existing Traefik config
|
|
```
|
|
|
|
---
|
|
|
|
## Infrastructure
|
|
|
|
| Component | Host | Address | Notes |
|
|
|-----------|------|---------|-------|
|
|
| Pipecat pipeline | `miaai` | `10.10.1.221` | Python async, systemd |
|
|
| Ollama LLM | `miaai` | `http://127.0.0.1:11434/v1` | `activeblue-avc:latest` |
|
|
| ChromaDB (Phase 2) | `miaai` | `http://10.10.1.221:8001` | Docker volume |
|
|
| Twilio webhook | `miaai` | `https://avc-phone.activeblue.net` | Traefik + Let's Encrypt |
|
|
| Monitoring dashboard | `miaai` | `https://avc-monitor.activeblue.net` | internal only |
|
|
| Odoo CRM | — | `https://avc.activeblue.net` | XML-RPC, db: `avc` |
|
|
| Recordings | `miaai` | `/home/tocmo0nlord/avc-phone/recordings/` | local only |
|
|
| Gitea | — | `https://git.activeblue.net/tocmo0nlord/avc-phone-ai` | user: `tocmo0nlord` |
|
|
|
|
---
|
|
|
|
## RAG Store (Phase 2)
|
|
|
|
**Stack:** ChromaDB + `nomic-embed-text:latest` (already in Ollama)
|
|
**Collection:** `avc_knowledge`
|
|
**Retrieval:** Top-3 chunks per query on caller's current turn only
|
|
|
|
### JSONL record format
|
|
|
|
```json
|
|
{
|
|
"id": "hours-kendall-weekday",
|
|
"text": "The Kendall location is open Monday through Friday 8:00 AM to 5:00 PM.",
|
|
"tags": ["hours", "kendall"],
|
|
"last_updated": "2026-06-23"
|
|
}
|
|
```
|
|
|
|
### Data files — populated before Phase 2, not before Phase 1
|
|
|
|
| File | Content |
|
|
|------|---------|
|
|
| `avc_locations.jsonl` | Address, phone, fax, parking per location |
|
|
| `avc_providers.jsonl` | Name, title, specialty, locations, languages |
|
|
| `avc_services.jsonl` | Exam types, procedures |
|
|
| `avc_hours.jsonl` | Hours per location, holiday closures, after-hours |
|
|
| `avc_insurance.jsonl` | Accepted plans per location |
|
|
| `avc_faqs.jsonl` | Approved Q&A pairs |
|
|
|
|
**Note:** `practice.py` already contains real AVC location and insurance data scraped
|
|
from `advancedvisioncareflorida.com`. Use it as the seed for the JSONL files rather
|
|
than starting from scratch.
|
|
|
|
---
|
|
|
|
## Claude Monitoring (Phase 4)
|
|
|
|
### What it analyzes
|
|
|
|
- Facts stated by AVA contradicting RAG store
|
|
- System prompt violations
|
|
- Calls that should have been handoffs
|
|
- High failed turn counts — model or prompt signal
|
|
- RAG gaps (AVA said "I don't have that" — should it be added?)
|
|
- Phrasing that caused caller confusion
|
|
|
|
### Output schema
|
|
|
|
```json
|
|
{
|
|
"call_sid": "CA...",
|
|
"severity": "high",
|
|
"issue_type": "factual_error",
|
|
"description": "AVA stated Kendall closes at 6pm. RAG store says 5pm.",
|
|
"suggested_action": "rag_update",
|
|
"suggested_change": {
|
|
"file": "rag/data/avc_hours.jsonl",
|
|
"record_id": "hours-kendall-weekday",
|
|
"field": "text",
|
|
"old": "...open until 6pm...",
|
|
"new": "...open until 5pm..."
|
|
}
|
|
}
|
|
```
|
|
|
|
`suggested_action`: `rag_update` | `prompt_change` | `blocklist_add` | `flag_for_review`
|
|
|
|
### Dashboard
|
|
|
|
FastAPI + HTML/JS at `https://avc-monitor.activeblue.net` (internal only).
|
|
|
|
| Feature | Description |
|
|
|---------|-------------|
|
|
| Enable/disable toggle | Pauses scheduler without redeployment |
|
|
| Alert queue | Suggestions sorted by severity |
|
|
| One-click apply | Applies change, commits via Gitea API to `avc-phone-ai` |
|
|
| Call playback | Audio + transcript side-by-side |
|
|
| Quality tagging | Staff tags calls from dashboard |
|
|
| Manual trigger | `POST /monitor/run` |
|
|
|
|
---
|
|
|
|
## Fine-Tuning Pipeline (Phase 5 — stub)
|
|
|
|
> Not scaffolded until Phase 4 complete and monitoring has run minimum two weeks.
|
|
> See `training/README.md` — populated at Phase 5 start.
|
|
|
|
- Synthetic data: Claude API generates Q&A in JSON output format — schema not style
|
|
- Real calls: staff-tagged `"good"` + corrected bad calls
|
|
- Target: 500+ pairs per intent before first Axolotl run
|
|
- QLoRA via Axolotl on RTX 5080, base: HuggingFace `meta-llama/Llama-3.1-8B-Instruct`
|
|
- Versioned Ollama models: `activeblue-avc:vN`
|
|
- A/B routing: promote when new version wins on booking + hallucination rate over 50+ calls
|
|
|
|
---
|
|
|
|
## HIPAA and Compliance
|
|
|
|
- AVA identifies as automated at call start — no exceptions
|
|
- No PHI in ChromaDB — practice information only
|
|
- Recordings on `miaai` only — no cloud storage
|
|
- Odoo API user: minimum permissions, not admin
|
|
- All endpoints HTTPS via Traefik
|
|
- `.env` never committed
|
|
|
|
---
|
|
|
|
## Deploy Script (`scripts/deploy.sh`)
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -e
|
|
cd /home/tocmo0nlord/avc-phone
|
|
git pull origin main
|
|
pip install -r requirements.txt --quiet
|
|
systemctl restart avc-phone
|
|
systemctl status avc-phone --no-pager
|
|
echo "[deploy] Done."
|
|
```
|
|
|
|
---
|
|
|
|
## Development Conventions
|
|
|
|
- Python 3.13 (matches `miaai` miniconda environment)
|
|
- Async throughout — Pipecat is async-native
|
|
- `loguru` for all logging — already in use, keep consistent
|
|
- Structured log lines for all diagnostic events
|
|
- `python-dotenv` for local dev, env injection in prod
|
|
- Secrets never hardcoded
|
|
- Every module has `if __name__ == "__main__":` for isolated testing
|
|
|
|
---
|
|
|
|
## Key Dependencies (current)
|
|
|
|
```
|
|
pipecat-ai==1.3.0 # installed at /opt/miniconda3
|
|
faster-whisper # real-time STT (already installed in pipecat-run venv)
|
|
kokoro-tts # already installed
|
|
ollama # already installed
|
|
scipy / numpy # already installed (pipecat deps)
|
|
chromadb # add for Phase 2
|
|
sentence-transformers # add for Phase 2
|
|
anthropic # for monitoring + optional LLM swap
|
|
openai-whisper # large-v3 for post-call transcription (Phase 3)
|
|
fastapi / uvicorn # already installed
|
|
loguru # already installed
|
|
httpx # already installed
|
|
```
|
|
|
|
---
|
|
|
|
## Open Items
|
|
|
|
- [ ] Confirm `TWILIO_AUTH_TOKEN` in `.env` is current (rotate if leaked/stale)
|
|
- [ ] Confirm `ODOO_STAGE_ID`, `ODOO_TEAM_ID`, `ODOO_USER_ID` from live `avc` db
|
|
- [ ] Confirm AVA voice — `af_heart` is current default, confirm with AVC before go-live
|
|
- [ ] Populate `rag/data/*.jsonl` before Phase 2 (seed from `practice.py` data)
|
|
- [ ] Define Odoo confirmed appointment flow: lead → opportunity → calendar event
|
|
- [ ] Staff training on monitoring dashboard quality tagging
|
|
|
|
---
|
|
|
|
*Active Blue LLC | git.activeblue.net/tocmo0nlord/avc-phone-ai*
|