Add vision LLM path for receipt vendor/category identification
When RECEIPT_VISION_MODE=vision (default), uploaded receipt images are sent directly to the vision-capable LLM (llama3.2-vision via Ollama) instead of the OCR text excerpt. The model can read logos, stylised fonts, and layouts that Tesseract OCR mangles (Home Depot, HMSHost/Sergio's, etc.). Architecture: - amount + date: always from Tesseract regex (deterministic, never LLM) - vendor + category: vision LLM when image available, text LLM as fallback - Fallthrough: if vision call fails for any reason, text path is tried next - PDF/TXT/HTML receipts: always use text path (not visual media) Revert instantly without a rebuild: echo "RECEIPT_VISION_MODE=text" >> /root/odoo/odoo-ai/.env docker compose up -d agent-service config.py: add receipt_vision_mode setting (default 'vision') expenses_agent.py: _VISION_MIMETYPES, _get_vision_mode() helper, dual-path _parse_receipt_text (b64/mimetype params), _act() passes b64 tests: 92 passing — 4 new vision tests, 2 existing prompt tests pinned to text mode via _get_vision_mode patch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -50,6 +50,11 @@ class Settings(BaseSettings):
|
||||
postgres_min_connections: int = 2
|
||||
postgres_max_connections: int = 10
|
||||
|
||||
# Receipt OCR / vision
|
||||
# 'vision' — use vision LLM for vendor+category when an image is uploaded (default)
|
||||
# 'text' — use Tesseract OCR text only (set RECEIPT_VISION_MODE=text to revert)
|
||||
receipt_vision_mode: str = 'vision'
|
||||
|
||||
# Rate limiting
|
||||
dispatch_rate_limit_per_user: int = 30 # requests per minute
|
||||
directive_timeout_minutes: int = 10
|
||||
|
||||
Reference in New Issue
Block a user