Remove vision OCR — use Tesseract-only pipeline for receipt parsing

The llama3.2-vision model was producing unreliable structured data
(wrong vendors, amounts, dates) making expense reports worse than
Tesseract + LLM extraction.  Removes _ocr_image_vision(), the
vision JSON fast path in _parse_receipt_text(), _match_category(),
and the vision_ocr_model config setting entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Carlos Garcia
2026-05-20 22:32:26 -04:00
parent ec6b41943f
commit 0320591344
4 changed files with 4 additions and 247 deletions

View File

@@ -16,10 +16,6 @@ class Settings(BaseSettings):
ollama_model: str = 'activeblue-chat'
ollama_timeout: int = 300
ollama_max_concurrent: int = 2
# Set to a vision-capable model (e.g. llama3.2-vision:11b) to use
# vision OCR for receipt images instead of Tesseract. Leave empty
# to keep the Tesseract pipeline.
vision_ocr_model: str = ''
# Anthropic / Claude
anthropic_api_key: str = ''