Add vision LLM path for receipt vendor/category identification

When RECEIPT_VISION_MODE=vision (default), uploaded receipt images are sent
directly to the vision-capable LLM (llama3.2-vision via Ollama) instead of
the OCR text excerpt.  The model can read logos, stylised fonts, and layouts
that Tesseract OCR mangles (Home Depot, HMSHost/Sergio's, etc.).

Architecture:
- amount + date: always from Tesseract regex (deterministic, never LLM)
- vendor + category: vision LLM when image available, text LLM as fallback
- Fallthrough: if vision call fails for any reason, text path is tried next
- PDF/TXT/HTML receipts: always use text path (not visual media)

Revert instantly without a rebuild:
  echo "RECEIPT_VISION_MODE=text" >> /root/odoo/odoo-ai/.env
  docker compose up -d agent-service

config.py: add receipt_vision_mode setting (default 'vision')
expenses_agent.py: _VISION_MIMETYPES, _get_vision_mode() helper,
  dual-path _parse_receipt_text (b64/mimetype params), _act() passes b64
tests: 92 passing — 4 new vision tests, 2 existing prompt tests
  pinned to text mode via _get_vision_mode patch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Carlos Garcia
2026-05-21 01:06:55 -04:00
parent db06fede5f
commit a736f3352b
3 changed files with 258 additions and 45 deletions

View File

@@ -50,6 +50,11 @@ class Settings(BaseSettings):
postgres_min_connections: int = 2
postgres_max_connections: int = 10
# Receipt OCR / vision
# 'vision' — use vision LLM for vendor+category when an image is uploaded (default)
# 'text' — use Tesseract OCR text only (set RECEIPT_VISION_MODE=text to revert)
receipt_vision_mode: str = 'vision'
# Rate limiting
dispatch_rate_limit_per_user: int = 30 # requests per minute
directive_timeout_minutes: int = 10