feat: OCR via tesseract, dedup, category selection for expense receipts
- Dockerfile: install tesseract-ocr so Pillow+pytesseract can OCR receipt images - operational_store: JSON-serialize raw_data before passing to asyncpg JSONB - receipt_parser: add SHA256 hash + date extracted from filename timestamps - expenses_agent: deduplicate receipts by hash before creating expense records - expenses_agent: fetch all expensable Odoo products, pass list to LLM for category selection (Meals, Flights, etc.) per receipt - expenses_agent: pass date_hint from filename (e.g. 20260509_180857.jpg -> 2026-05-09) as fallback when OCR text is unavailable - expenses_tools: add get_expense_products() to fetch all expensable products Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -7,6 +7,7 @@ WORKDIR /app
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
gcc libpq-dev \
|
||||
tesseract-ocr \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY requirements.txt .
|
||||
|
||||
Reference in New Issue
Block a user