- Reason visibility: the reason WAS extracted ("disintegrated eyes") but only
lived in the Odoo description note. Add it to the post-call log line and to
the Odoo lead title so it's visible at a glance.
- Latency: split the timing — Whisper is ~0.1s, latency is LLM-side. The ~3s
tail was cold model reloads after Ollama's keep-alive expired. server.py now
warms + pins the model on startup (keep_alive=-1, ollama ps UNTIL=Forever),
removing cold first-turn stalls. Whisper size left alone (not the bottleneck).
- CLAUDE.md: insurance rule (never suggest/guess the plan), latency note.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.7 KiB
4.7 KiB