odoo-ai/agent_service/tools/receipt_parser.py at beac16a6a955bd6f78fd5a35f97e3a0a8cfc0cc9

Files

Carlos Garcia 69519393c1 Add EasyOCR engine for receipt image parsing

EasyOCR (deep-learning OCR) replaces Tesseract as the default engine for
receipt images. It handles phone photos, thermal paper, dot-matrix fonts,
and rotated images significantly better than Tesseract without requiring
manual preprocessing pipelines.

Key design decisions:
- OCR_ENGINE=easyocr (default) | tesseract — switchable via .env, no rebuild
- EasyOCR Reader is a module-level singleton: model loaded once per container
  start, not per receipt
- Falls back to Tesseract automatically if EasyOCR fails or returns < 20 chars
- EXIF rotation fix still applied before EasyOCR (phone photo orientation)
- Images resized to max 2000px width for speed before passing to EasyOCR
- _easyocr_to_text() groups detections into visual lines (y-overlap) and
  sorts left-to-right within each line for clean single-string output

Revert: echo "OCR_ENGINE=tesseract" >> .env && docker compose up -d agent-service

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-21 01:22:22 -04:00

14 KiB

Raw Blame History

View Raw

14 KiB Raw Blame History

14 KiB

Raw Blame History