Adds module-level label and cross-reference to the new doc. TEST_EXPENSES_AGENT.md documents every test group, case, and the real-world bug each test guards against (e.g. In-N-Out OCR mismatch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.3 KiB
Test Suite: test_expenses_agent.py
Overview
Unit tests for the receipt-to-expense-report pipeline in the ActiveBlue AI agent service. All tests run without a live Odoo instance, database, or LLM — external dependencies are mocked so the suite completes in under one second.
File: tests/test_expenses_agent.py
Modules under test:
agent_service/agents/expenses_agent.pyagent_service/tools/receipt_parser.pyaddons/activeblue_ai/models/ab_ai_mail.py
Running the Tests
cd /root/odoo/odoo-ai
# First time only — create the test venv
python3 -m venv .venv-test
source .venv-test/bin/activate
pip install -r requirements-test.txt
# Every subsequent run
source .venv-test/bin/activate
python -m pytest tests/test_expenses_agent.py -v
Expected output: 41 passed, 5 skipped (the 5 skipped tests require the Odoo
environment to import ab_ai_mail.py; they are skipped cleanly on the agent service host).
Test Groups
1. TestFindSemanticDuplicate (18 tests)
What it tests: ExpensesAgent._find_semantic_duplicate(parsed, candidates)
This is the core deduplication algorithm that prevents the same physical receipt from being entered as multiple expense records. It runs in two passes:
| Pass | Trigger condition | Purpose |
|---|---|---|
| 1 | Same date + amount within $0.05 + vendor similarity ≥ 60% | Standard duplicate (two photos of the same receipt) |
| 2 | Same date + vendor similarity ≥ 80% (amount may differ) | OCR amount misread (e.g. $8.55 parsed as $15.00) |
Cases covered:
| Test | What it verifies |
|---|---|
test_exact_match |
Identical parsed fields → flagged as duplicate |
test_amount_within_threshold |
Amounts within $0.05 → duplicate |
test_amount_just_over_threshold |
Amounts $0.06+ apart → Pass 1 misses, Pass 2 catches via vendor+date |
test_different_date_not_duplicate |
Same vendor/amount, different date → not a dup |
test_zero_amount_not_deduplicated |
Zero-amount receipts are too ambiguous to dedup |
test_vendor_similarity_above_threshold |
"IN-N-OUT HOUSTON" vs "In-N-Out Houston" → duplicate |
test_vendor_similarity_below_threshold_pass1 |
Unrelated vendors, same amount+date → not a dup |
test_time_within_window_is_dup |
Transaction times within 30 min → duplicate |
test_time_outside_window_not_dup |
Transaction times > 30 min apart → different transactions |
test_one_time_missing_does_not_exclude |
Only one receipt has a time → time check skipped, other signals decide |
test_filename_vendor_same_amount_date_is_dup |
Vendor is raw filename → same amount+date sufficient |
test_no_candidates |
Empty candidate list → returns None |
test_returns_correct_index_multiple_candidates |
Correct index returned when multiple candidates present |
test_pass2_catches_ocr_amount_mismatch |
The In-N-Out bug — $8.55 vs $15.00, same vendor+date → caught by Pass 2 |
test_pass2_requires_high_vendor_similarity |
Clearly different vendors (Starbucks vs McDonalds) → not flagged |
test_pass2_same_date_required |
Different dates, high vendor similarity → not a dup |
test_pass2_respects_time_window |
High vendor similarity but > 30 min apart → different visit |
test_pass2_skips_filename_vendors |
Filename-looking vendors excluded from Pass 2 |
test_pass2_zero_amount_not_deduplicated |
Zero-amount edge case in Pass 2 |
Why this matters: In production, a user uploaded two photos of the same In-N-Out receipt.
OCR misread one total as $15.00 (correct: $8.55). Pass 1 missed it because the amounts
differed by $6.45. Pass 2 was added specifically to catch this: if the vendor name and date
match closely, the receipts are flagged as a likely OCR error regardless of the amount.
2. test_plan_* (11 tests)
What it tests: ExpensesAgent._plan() — keyword detection from the user's raw message
The _plan() method inspects both the LLM-rewritten task (intent summary) and the
raw_message (original user text) to determine:
user_confirmed— did the user approve the parsed receipt list?user_dup_decision— did the user say to skip or keep duplicates?mode—create_from_receiptsvsread
| Test | What it verifies |
|---|---|
test_plan_confirm_keyword_sets_confirmed |
"confirm" → user_confirmed=True |
test_plan_looks_good_sets_confirmed |
"looks good" → user_confirmed=True |
test_plan_go_ahead_sets_confirmed |
"go ahead" → user_confirmed=True |
test_plan_no_keyword_not_confirmed |
"create an expense report" → user_confirmed=False |
test_plan_keep_all_sets_dup_decision |
"confirm, keep all" → user_confirmed=True, user_dup_decision=keep_all |
test_plan_skip_sets_dup_decision |
"skip duplicates" → user_dup_decision=skip |
test_plan_default_dup_decision_is_skip |
"confirm" with no dup instruction → default skip |
test_plan_mode_is_read_without_receipts |
No receipts attached → mode=read |
test_plan_mode_is_create_with_receipts |
Receipts present → mode=create_from_receipts |
test_plan_task_field_also_checked |
Confirm keyword in LLM-rewritten task field also works |
Why this matters: The MasterAgent rewrites the user's message as an intent_summary
before passing it to the expenses agent. Short replies like "confirm" or "skip" can be lost
in the rewrite. The agent checks both fields so these keywords are never missed.
3. test_act_* (4 tests)
What it tests: ExpensesAgent._act() — the creation pipeline and confirmation gate
_act() is the most complex method. It orchestrates byte dedup → concurrent LLM parse →
semantic dedup → confirmation gate → Odoo write. These tests mock the Odoo tools
(ExpensesTools) and the LLM parse to control inputs precisely.
| Test | What it verifies |
|---|---|
test_act_enters_awaiting_confirmation_on_first_pass |
First call (not confirmed) → returns [], sets mode to awaiting_confirmation, populates _confirmation_items |
test_act_creates_sheet_when_confirmed |
Second call (confirmed) → calls create_expense_sheet and create_expense, returns action strings |
test_act_deduplicates_byte_identical_receipts |
Two receipts with the same SHA256 → only one create_expense call |
test_act_no_employee_returns_empty_and_escalates |
No employee record found → returns [], adds escalation message |
Why this matters: The confirmation gate is the last human checkpoint before any data is written to Odoo. These tests verify that: (a) no records are written on the first pass, (b) records are written exactly once on confirmation, and (c) byte-identical files are never double-entered regardless of the dup decision.
4. TestParseUpload (8 tests)
What it tests: receipt_parser.parse_upload(filename, data) — file parsing and metadata extraction
parse_upload is the entry point for all receipt data. It handles file type detection,
ZIP extraction, OCR invocation, and metadata harvesting.
| Test | What it verifies |
|---|---|
test_text_file_parsed |
Plain .txt file → correct text, mimetype, sha256 |
test_date_extracted_from_filename |
20260509_180857.jpg → date_from_name=2026-05-09 |
test_no_date_in_plain_filename |
receipt.txt → date_from_name=None |
test_zip_extracted |
ZIP containing 2 files → 2 receipt dicts returned |
test_zip_skips_directories |
ZIP with directory entry (subdir/) → directory skipped, only files returned |
test_empty_zip_returns_empty |
Empty ZIP → [] |
test_sha256_is_consistent |
Same bytes with different filenames → same SHA256 |
test_b64_decodes_to_original |
Base64 field round-trips back to original bytes |
Why this matters: date_from_name is used as a fallback when OCR cannot find a date in
the receipt text. SHA256 is the key for byte-exact deduplication. If either is wrong, expenses
land on the wrong date or duplicates slip through.
5. TestTextToHtml (5 tests — skipped without Odoo env)
What it tests: ab_ai_mail._text_to_html(text) — plain text → safe HTML conversion
Bot replies are plain text from the agent service. _text_to_html converts them to safe
HTML for display in Odoo Discuss.
| Test | What it verifies |
|---|---|
test_plain_text_unchanged |
Normal text passes through |
test_newline_becomes_br |
\n → <br> so multi-line replies render correctly |
test_html_special_chars_escaped |
<script> → <script> (XSS prevention) |
test_ampersand_escaped |
& → & |
test_empty_string |
Empty input → empty output |
Why this matters: If escaping is broken, a vendor name containing < or & (common in
OCR output) would break the Discuss message rendering or create an XSS vector.
Adding New Tests
When modifying the dedup algorithm, add a test in TestFindSemanticDuplicate that
reproduces the real-world case that motivated the change. Name it after the vendor or
scenario (e.g. test_pass2_catches_ocr_amount_mismatch was named after the In-N-Out bug).
When adding a new confirmation keyword (e.g. "yes please"), add a corresponding
test_plan_* case.
When changing the _act() flow, add or update a test_act_* case that verifies the
new state transition with mocked Odoo tools.