odoo-ai/agent_service/tools/receipt_parser.py at 1536d83376ae85eace10b2ab796795c09b6ee96a

Files

Carlos Garcia 1536d83376 Improve OCR preprocessing and amount extraction robustness

Image preprocessing (receipt_parser.py):
- Add ImageOps.exif_transpose() — fixes portrait photos stored with EXIF
  rotation metadata (most phone photos); without this Tesseract reads a
  rotated image and produces garbage
- Upscale images < 600px wide for better character recognition
- Raise binarization threshold 140→160 for faint thermal-print receipts
- Try PSM 6 (single text block) before PSM 4, PSM 11 as fallbacks;
  PSM 6 is better suited to single-column receipt layout

Amount extraction (expenses_agent.py):
- Add Pass 2 bottom-of-receipt line scan when labeled Total: regex fails;
  reads lines bottom-to-top in the last 50% of text, skipping change/tip
  lines — handles 'T0TAL' OCR misread and amount-on-next-line layout
- Add _SKIP_LINE_RE and _ANY_DOLLAR_RE module-level patterns
- 8 new tests covering garbled total, change-skip, USD suffix, etc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-20 23:33:38 -04:00

7.7 KiB

Raw Blame History

View Raw

7.7 KiB Raw Blame History

7.7 KiB

Raw Blame History