Commit Graph

5 Commits

Author SHA1 Message Date
Carlos Garcia
11cc261923 fix: vision OCR receipt extraction — skip second LLM call, fix total truncation
receipt_parser: change _ocr_image_vision() to extract structured JSON
{vendor,amount,date,time,category} directly from the image instead of
transcribing raw text, so the downstream LLM extraction step is
unnecessary and the two-step error-compounding is eliminated.

expenses_agent: add _match_category() helper to map vision category
labels to expense product names via substring/fuzzy match; add fast
path in _parse_receipt_text() that detects pre-extracted vision JSON
(text starts with '{') and skips the second LLM submit call entirely.
Fix text[:2000] truncation that discarded receipt totals — now keeps
first 1500 + last 1500 chars of long receipts so the grand total at
the bottom is always included.

tests: fix stale test_act_enters_awaiting_confirmation_on_first_pass
(confirmation gate was removed); add TestMatchCategory and three new
tests for the vision JSON fast path and LLM fallthrough.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 21:49:31 -04:00
Carlos Garcia
9f38fb013c docs: label test file and add TEST_EXPENSES_AGENT.md
Adds module-level label and cross-reference to the new doc.
TEST_EXPENSES_AGENT.md documents every test group, case, and the
real-world bug each test guards against (e.g. In-N-Out OCR mismatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:35:07 -04:00
Carlos Garcia
469025b6f2 test: fix bad vendor example in pass2 similarity test
'Restaurant A' vs 'Restaurant Z' differ by 1 char so difflib scores
them at ~91% -- correctly above the 80% threshold. Use clearly
different vendors (Starbucks Coffee vs McDonalds Burger) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:32:38 -04:00
Carlos Garcia
1c5f6e7ca3 test: fix _ext import (only exists in ab_ai_mail, not receipt_parser)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:31:30 -04:00
Carlos Garcia
6fcd830e6f test: unit tests for expenses agent dedup, plan, act, and receipt parser
- TestFindSemanticDuplicate: 18 cases covering Pass1 (amount match),
  Pass2 (OCR mismatch / high vendor similarity), time window, filenames,
  zero-amount exclusion, multi-candidate index correctness
- test_plan_*: keyword detection for confirm/skip/keep-all, mode routing
- test_act_*: confirmation gate, byte-dedup, no-employee escalation,
  confirmed creation with mocked Odoo tools
- TestParseUpload: ZIP extraction, directory skipping, filename date
  parsing, SHA256 consistency, b64 round-trip
- TestTextToHtml: escaping, newline to <br>, empty string

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:11:32 -04:00