Pass 1 unchanged: same date + amount within 0.05 + vendor similarity 60%.
Pass 2 (new): same vendor (>= 80% similarity) + same date, regardless
of amount, to catch receipts where OCR misread the total.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced 'pick the largest one' guidance with 'bottom-most total' and
'return 0 if no clear total found' to avoid picking line items or tips.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- master_agent: thread raw user message into extra_context and peer_data so
expenses_agent can check it directly without relying on LLM intent_summary
- master_agent: when receipts are in extra_context always route to expenses_agent,
so replies like 'skip duplicates' still trigger expense processing
- expenses_agent: _plan() checks peer_data raw_message alongside task so
skip/keep keywords are detected even when master rewrites the intent
- ab_ai_mail: wrap clarification message HTML in Markup() so Odoo does not
re-escape the tags; use <br> instead of <br/>
- ab_ai_mail: convert agent plain-text replies newlines to <br> for proper
line-break rendering in Discuss
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- expenses_agent: extract transaction time (HH:MM) from OCR receipt text
- expenses_agent: _find_semantic_duplicate uses time to rule out false positives (>30 min apart = different receipts)
- expenses_agent: pause when duplicates found, set mode=awaiting_dup_approval, ask user before creating sheet
- expenses_agent: _report formats approval message listing each dup pair with vendor/amount/date/times/filenames
- ab_ai_mail: _find_pending_attachments recognises dup-approval bot message so ZIP re-attaches on user reply
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After parsing all receipts, identify photos that are different shots of
the same physical receipt by comparing amount + date + vendor similarity
(difflib ratio >= 0.6). When a duplicate is found, keep whichever photo
produced the most OCR text (clearest shot) and report the skipped ones.
Zero-amount receipts (OCR failed entirely) are excluded from semantic
dedup to avoid false positives.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dockerfile: add tesseract-ocr-osd for orientation detection data
- receipt_parser: resize large phone photos to 1800px, convert to
grayscale, sharpen before OCR; use psm 1 (auto + OSD) so rotated
receipts are correctly oriented before text extraction
- expenses_agent: tighten amount extraction prompt to pick the FINAL
total, not subtotal or tax line, reducing misreads like 42.90->409.00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ab_ai_bot: raise requests.post timeout 120s -> 600s so long OCR+LLM
runs don't silently drop the reply in Discuss
- upload: run parse_upload in ThreadPoolExecutor so tesseract OCR
doesn't block the FastAPI event loop
- expenses_agent: parse all receipts concurrently with asyncio.gather
(Ollama semaphore caps parallelism at 2); reduces 13-receipt LLM
time from ~39s sequential to ~20s parallel
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dockerfile: install tesseract-ocr so Pillow+pytesseract can OCR receipt images
- operational_store: JSON-serialize raw_data before passing to asyncpg JSONB
- receipt_parser: add SHA256 hash + date extracted from filename timestamps
- expenses_agent: deduplicate receipts by hash before creating expense records
- expenses_agent: fetch all expensable Odoo products, pass list to LLM for
category selection (Meals, Flights, etc.) per receipt
- expenses_agent: pass date_hint from filename (e.g. 20260509_180857.jpg -> 2026-05-09)
as fallback when OCR text is unavailable
- expenses_tools: add get_expense_products() to fetch all expensable products
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Discuss bot now reads ir.attachment from incoming messages; file-only
messages no longer silently dropped
- ZIP files are described (contents listed) and bot asks clarifying
question before acting; user's follow-up reply looks back for pending
attachments so files don't need to be re-uploaded
- receipt_parser: extracts text from ZIP (recursive), JPG/PNG/etc (OCR),
PDF (pdfplumber), HTML, TXT
- expenses_agent: full rewrite fixing broken method signatures; adds
create_expense_sheet / create_expense / attach_receipt flow driven by
LLM receipt parsing (Ollama, HIPAA-locked)
- master_agent: extra_context threads receipts + user_id into directives
- FastAPI /upload multipart endpoint; registered in main.py
- Odoo /ai/upload controller proxies files to agent service
- ab_ai_bot: dispatch_message_with_files() for multipart uploads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>