Remove vision OCR — use Tesseract-only pipeline for receipt parsing

The llama3.2-vision model was producing unreliable structured data (wrong vendors, amounts, dates) making expense reports worse than Tesseract + LLM extraction. Removes _ocr_image_vision(), the vision JSON fast path in _parse_receipt_text(), _match_category(), and the vision_ocr_model config setting entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:32:26 -04:00
parent ec6b41943f
commit 0320591344
4 changed files with 4 additions and 247 deletions
--- a/agent_service/config.py
+++ b/agent_service/config.py
@@ -16,10 +16,6 @@ class Settings(BaseSettings):
    ollama_model: str = 'activeblue-chat'
    ollama_timeout: int = 300
    ollama_max_concurrent: int = 2
-    # Set to a vision-capable model (e.g. llama3.2-vision:11b) to use
-    # vision OCR for receipt images instead of Tesseract.  Leave empty
-    # to keep the Tesseract pipeline.
-    vision_ocr_model: str = ''

    # Anthropic / Claude
    anthropic_api_key: str = ''