fix(expenses): LAYAL CAFE $2.80 bug, United Airlines rotation & date
LAYAL CAFE ($2.80 instead of $42.90): - Add (?!\s*tax) lookahead to _TOTAL_RE so "Total Taxes $2.80" is never confused with the receipt total when OCR drops the "Taxes" word - Change Pass 1 from matches[-1] to max() so the largest labeled amount always wins, regardless of line order in the OCR output United Airlines (Subway/$0/wrong date): - Add OSD-based rotation correction in receipt_parser.py: after EXIF transpose, ask Tesseract's orientation-detection engine (--psm 0) what angle to rotate; applies to receipts photographed lying sideways where EXIF metadata cannot help - Add month-name date patterns (DD MON YYYY / MON DD YYYY) to _extract_date_from_text for airline/hotel receipts that print dates like "05 MAY 2026" instead of "05/07/26" 85 tests, all passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -100,6 +100,23 @@ def _ocr_image_tesseract(data: bytes, filename: str) -> str:
|
||||
except Exception:
|
||||
pass # exif_transpose requires Pillow >= 6.0
|
||||
|
||||
# ── Step 1b: Content-based rotation correction ───────────────────────
|
||||
# EXIF transpose (Step 1) only corrects for phone-tilt metadata.
|
||||
# If the receipt was physically laid sideways in the frame (e.g. a
|
||||
# landscape receipt photographed with the phone upright), the pixels
|
||||
# are genuinely rotated and EXIF can't help. Ask Tesseract's OSD
|
||||
# engine to detect the text orientation and rotate to correct it.
|
||||
try:
|
||||
osd = pytesseract.image_to_osd(img, config='--psm 0')
|
||||
_am = re.search(r'Rotate:\s*(\d+)', osd)
|
||||
if _am:
|
||||
_angle = int(_am.group(1))
|
||||
if _angle:
|
||||
img = img.rotate(_angle, expand=True)
|
||||
logger.debug('OSD: rotated %s by %d°', filename, _angle)
|
||||
except Exception:
|
||||
pass # OSD unavailable or not enough text — proceed without correction
|
||||
|
||||
# ── Step 2: Resize to working width (1800px) ──────────────────────────
|
||||
max_w = 1800
|
||||
if img.width > max_w:
|
||||
|
||||
Reference in New Issue
Block a user