Add vision OCR via Ollama vision model with Tesseract fallback

Introduces VISION_OCR_MODEL setting. When set (e.g. llama3.2-vision:11b), receipt images are transcribed by the Ollama vision model before falling back to Tesseract. Also improves Tesseract preprocessing with adaptive binarisation (autocontrast + threshold at 140) for better accuracy on thermal receipts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:43:21 -04:00
parent 9f38fb013c
commit 5b924e60de
2 changed files with 63 additions and 10 deletions
--- a/agent_service/config.py
+++ b/agent_service/config.py
@@ -16,6 +16,10 @@ class Settings(BaseSettings):
    ollama_model: str = 'activeblue-chat'
    ollama_timeout: int = 120
    ollama_max_concurrent: int = 2
+    # Set to a vision-capable model (e.g. llama3.2-vision:11b) to use
+    # vision OCR for receipt images instead of Tesseract.  Leave empty
+    # to keep the Tesseract pipeline.
+    vision_ocr_model: str = ''

    # Anthropic / Claude
    anthropic_api_key: str = ''