feat: OCR via tesseract, dedup, category selection for expense receipts

- Dockerfile: install tesseract-ocr so Pillow+pytesseract can OCR receipt images - operational_store: JSON-serialize raw_data before passing to asyncpg JSONB - receipt_parser: add SHA256 hash + date extracted from filename timestamps - expenses_agent: deduplicate receipts by hash before creating expense records - expenses_agent: fetch all expensable Odoo products, pass list to LLM for category selection (Meals, Flights, etc.) per receipt - expenses_agent: pass date_hint from filename (e.g. 20260509_180857.jpg -> 2026-05-09) as fallback when OCR text is unavailable - expenses_tools: add get_expense_products() to fetch all expensable products Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:40:32 -04:00
parent 6ab9624ec6
commit ef6dad5a81
5 changed files with 96 additions and 21 deletions
--- a/agent_service/tools/expenses_tools.py
+++ b/agent_service/tools/expenses_tools.py
@@ -107,6 +107,17 @@ class ExpensesTools:
            logger.warning('get_default_expense_product failed: %s', exc)
            return None

+    async def get_expense_products(self) -> list:
+        """Return all expensable products for category selection."""
+        try:
+            return await self._o.search_read(
+                'product.product',
+                [('can_be_expensed', '=', True)],
+                ['id', 'name'], limit=100)
+        except Exception as exc:
+            logger.warning('get_expense_products failed: %s', exc)
+            return []
+
    async def create_expense_sheet(self, name: str, employee_id: int):
        return await self._o.create('hr.expense.sheet', {
            'name': name,