Log/surface the reason, pin LLM warm for latency, doc insurance rule

- Reason visibility: the reason WAS extracted ("disintegrated eyes") but only
  lived in the Odoo description note. Add it to the post-call log line and to
  the Odoo lead title so it's visible at a glance.
- Latency: split the timing — Whisper is ~0.1s, latency is LLM-side. The ~3s
  tail was cold model reloads after Ollama's keep-alive expired. server.py now
  warms + pins the model on startup (keep_alive=-1, ollama ps UNTIL=Forever),
  removing cold first-turn stalls. Whisper size left alone (not the bottleneck).
- CLAUDE.md: insurance rule (never suggest/guess the plan), latency note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-27 04:24:10 +00:00
parent 8b52097713
commit ba36ae6891
4 changed files with 34 additions and 3 deletions

View File

@@ -123,6 +123,7 @@ async def extract_and_record(messages, ollama_url, model, call_sid=None, caller_
where = persist_appointment(record)
logger.info(
f"Post-call appointment saved ({where}): {record['patient_name']} / "
f"{record['location']} / ins={record['insurance']} / when={record['preferred_time']}"
f"{record['location']} / reason={record['reason']} / ins={record['insurance']} / "
f"when={record['preferred_time']}"
)
return record