feat: OCR via tesseract, dedup, category selection for expense receipts

- Dockerfile: install tesseract-ocr so Pillow+pytesseract can OCR receipt images
- operational_store: JSON-serialize raw_data before passing to asyncpg JSONB
- receipt_parser: add SHA256 hash + date extracted from filename timestamps
- expenses_agent: deduplicate receipts by hash before creating expense records
- expenses_agent: fetch all expensable Odoo products, pass list to LLM for
  category selection (Meals, Flights, etc.) per receipt
- expenses_agent: pass date_hint from filename (e.g. 20260509_180857.jpg -> 2026-05-09)
  as fallback when OCR text is unavailable
- expenses_tools: add get_expense_products() to fetch all expensable products

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Carlos Garcia
2026-05-16 01:40:32 -04:00
parent 6ab9624ec6
commit ef6dad5a81
5 changed files with 96 additions and 21 deletions

View File

@@ -107,6 +107,17 @@ class ExpensesTools:
logger.warning('get_default_expense_product failed: %s', exc)
return None
async def get_expense_products(self) -> list:
"""Return all expensable products for category selection."""
try:
return await self._o.search_read(
'product.product',
[('can_be_expensed', '=', True)],
['id', 'name'], limit=100)
except Exception as exc:
logger.warning('get_expense_products failed: %s', exc)
return []
async def create_expense_sheet(self, name: str, employee_id: int):
return await self._o.create('hr.expense.sheet', {
'name': name,