fix: improve OCR accuracy for rotated/sideways receipt photos

- Dockerfile: add tesseract-ocr-osd for orientation detection data
- receipt_parser: resize large phone photos to 1800px, convert to
  grayscale, sharpen before OCR; use psm 1 (auto + OSD) so rotated
  receipts are correctly oriented before text extraction
- expenses_agent: tighten amount extraction prompt to pick the FINAL
  total, not subtotal or tax line, reducing misreads like 42.90->409.00

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Carlos Garcia
2026-05-16 01:51:29 -04:00
parent 8a9d772b8e
commit c2d1078d79
3 changed files with 33 additions and 6 deletions

View File

@@ -8,6 +8,7 @@ WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev \
tesseract-ocr \
tesseract-ocr-osd \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .