Add vision LLM path for receipt vendor/category identification

When RECEIPT_VISION_MODE=vision (default), uploaded receipt images are sent directly to the vision-capable LLM (llama3.2-vision via Ollama) instead of the OCR text excerpt. The model can read logos, stylised fonts, and layouts that Tesseract OCR mangles (Home Depot, HMSHost/Sergio's, etc.). Architecture: - amount + date: always from Tesseract regex (deterministic, never LLM) - vendor + category: vision LLM when image available, text LLM as fallback - Fallthrough: if vision call fails for any reason, text path is tried next - PDF/TXT/HTML receipts: always use text path (not visual media) Revert instantly without a rebuild: echo "RECEIPT_VISION_MODE=text" >> /root/odoo/odoo-ai/.env docker compose up -d agent-service config.py: add receipt_vision_mode setting (default 'vision') expenses_agent.py: _VISION_MIMETYPES, _get_vision_mode() helper, dual-path _parse_receipt_text (b64/mimetype params), _act() passes b64 tests: 92 passing — 4 new vision tests, 2 existing prompt tests pinned to text mode via _get_vision_mode patch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 01:06:55 -04:00
parent db06fede5f
commit a736f3352b
3 changed files with 258 additions and 45 deletions
--- a/agent_service/config.py
+++ b/agent_service/config.py
@@ -50,6 +50,11 @@ class Settings(BaseSettings):
    postgres_min_connections: int = 2
    postgres_max_connections: int = 10

+    # Receipt OCR / vision
+    # 'vision' — use vision LLM for vendor+category when an image is uploaded (default)
+    # 'text'   — use Tesseract OCR text only (set RECEIPT_VISION_MODE=text to revert)
+    receipt_vision_mode: str = 'vision'
+
    # Rate limiting
    dispatch_rate_limit_per_user: int = 30  # requests per minute
    directive_timeout_minutes: int = 10