Files
odoo-ai/tests/TEST_EXPENSES_AGENT.md
Carlos Garcia 9f38fb013c docs: label test file and add TEST_EXPENSES_AGENT.md
Adds module-level label and cross-reference to the new doc.
TEST_EXPENSES_AGENT.md documents every test group, case, and the
real-world bug each test guards against (e.g. In-N-Out OCR mismatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:35:07 -04:00

9.3 KiB

Test Suite: test_expenses_agent.py

Overview

Unit tests for the receipt-to-expense-report pipeline in the ActiveBlue AI agent service. All tests run without a live Odoo instance, database, or LLM — external dependencies are mocked so the suite completes in under one second.

File: tests/test_expenses_agent.py
Modules under test:

  • agent_service/agents/expenses_agent.py
  • agent_service/tools/receipt_parser.py
  • addons/activeblue_ai/models/ab_ai_mail.py

Running the Tests

cd /root/odoo/odoo-ai

# First time only — create the test venv
python3 -m venv .venv-test
source .venv-test/bin/activate
pip install -r requirements-test.txt

# Every subsequent run
source .venv-test/bin/activate
python -m pytest tests/test_expenses_agent.py -v

Expected output: 41 passed, 5 skipped (the 5 skipped tests require the Odoo environment to import ab_ai_mail.py; they are skipped cleanly on the agent service host).


Test Groups

1. TestFindSemanticDuplicate (18 tests)

What it tests: ExpensesAgent._find_semantic_duplicate(parsed, candidates)

This is the core deduplication algorithm that prevents the same physical receipt from being entered as multiple expense records. It runs in two passes:

Pass Trigger condition Purpose
1 Same date + amount within $0.05 + vendor similarity ≥ 60% Standard duplicate (two photos of the same receipt)
2 Same date + vendor similarity ≥ 80% (amount may differ) OCR amount misread (e.g. $8.55 parsed as $15.00)

Cases covered:

Test What it verifies
test_exact_match Identical parsed fields → flagged as duplicate
test_amount_within_threshold Amounts within $0.05 → duplicate
test_amount_just_over_threshold Amounts $0.06+ apart → Pass 1 misses, Pass 2 catches via vendor+date
test_different_date_not_duplicate Same vendor/amount, different date → not a dup
test_zero_amount_not_deduplicated Zero-amount receipts are too ambiguous to dedup
test_vendor_similarity_above_threshold "IN-N-OUT HOUSTON" vs "In-N-Out Houston" → duplicate
test_vendor_similarity_below_threshold_pass1 Unrelated vendors, same amount+date → not a dup
test_time_within_window_is_dup Transaction times within 30 min → duplicate
test_time_outside_window_not_dup Transaction times > 30 min apart → different transactions
test_one_time_missing_does_not_exclude Only one receipt has a time → time check skipped, other signals decide
test_filename_vendor_same_amount_date_is_dup Vendor is raw filename → same amount+date sufficient
test_no_candidates Empty candidate list → returns None
test_returns_correct_index_multiple_candidates Correct index returned when multiple candidates present
test_pass2_catches_ocr_amount_mismatch The In-N-Out bug$8.55 vs $15.00, same vendor+date → caught by Pass 2
test_pass2_requires_high_vendor_similarity Clearly different vendors (Starbucks vs McDonalds) → not flagged
test_pass2_same_date_required Different dates, high vendor similarity → not a dup
test_pass2_respects_time_window High vendor similarity but > 30 min apart → different visit
test_pass2_skips_filename_vendors Filename-looking vendors excluded from Pass 2
test_pass2_zero_amount_not_deduplicated Zero-amount edge case in Pass 2

Why this matters: In production, a user uploaded two photos of the same In-N-Out receipt. OCR misread one total as $15.00 (correct: $8.55). Pass 1 missed it because the amounts differed by $6.45. Pass 2 was added specifically to catch this: if the vendor name and date match closely, the receipts are flagged as a likely OCR error regardless of the amount.


2. test_plan_* (11 tests)

What it tests: ExpensesAgent._plan() — keyword detection from the user's raw message

The _plan() method inspects both the LLM-rewritten task (intent summary) and the raw_message (original user text) to determine:

  • user_confirmed — did the user approve the parsed receipt list?
  • user_dup_decision — did the user say to skip or keep duplicates?
  • modecreate_from_receipts vs read
Test What it verifies
test_plan_confirm_keyword_sets_confirmed "confirm" → user_confirmed=True
test_plan_looks_good_sets_confirmed "looks good" → user_confirmed=True
test_plan_go_ahead_sets_confirmed "go ahead" → user_confirmed=True
test_plan_no_keyword_not_confirmed "create an expense report" → user_confirmed=False
test_plan_keep_all_sets_dup_decision "confirm, keep all" → user_confirmed=True, user_dup_decision=keep_all
test_plan_skip_sets_dup_decision "skip duplicates" → user_dup_decision=skip
test_plan_default_dup_decision_is_skip "confirm" with no dup instruction → default skip
test_plan_mode_is_read_without_receipts No receipts attached → mode=read
test_plan_mode_is_create_with_receipts Receipts present → mode=create_from_receipts
test_plan_task_field_also_checked Confirm keyword in LLM-rewritten task field also works

Why this matters: The MasterAgent rewrites the user's message as an intent_summary before passing it to the expenses agent. Short replies like "confirm" or "skip" can be lost in the rewrite. The agent checks both fields so these keywords are never missed.


3. test_act_* (4 tests)

What it tests: ExpensesAgent._act() — the creation pipeline and confirmation gate

_act() is the most complex method. It orchestrates byte dedup → concurrent LLM parse → semantic dedup → confirmation gate → Odoo write. These tests mock the Odoo tools (ExpensesTools) and the LLM parse to control inputs precisely.

Test What it verifies
test_act_enters_awaiting_confirmation_on_first_pass First call (not confirmed) → returns [], sets mode to awaiting_confirmation, populates _confirmation_items
test_act_creates_sheet_when_confirmed Second call (confirmed) → calls create_expense_sheet and create_expense, returns action strings
test_act_deduplicates_byte_identical_receipts Two receipts with the same SHA256 → only one create_expense call
test_act_no_employee_returns_empty_and_escalates No employee record found → returns [], adds escalation message

Why this matters: The confirmation gate is the last human checkpoint before any data is written to Odoo. These tests verify that: (a) no records are written on the first pass, (b) records are written exactly once on confirmation, and (c) byte-identical files are never double-entered regardless of the dup decision.


4. TestParseUpload (8 tests)

What it tests: receipt_parser.parse_upload(filename, data) — file parsing and metadata extraction

parse_upload is the entry point for all receipt data. It handles file type detection, ZIP extraction, OCR invocation, and metadata harvesting.

Test What it verifies
test_text_file_parsed Plain .txt file → correct text, mimetype, sha256
test_date_extracted_from_filename 20260509_180857.jpgdate_from_name=2026-05-09
test_no_date_in_plain_filename receipt.txtdate_from_name=None
test_zip_extracted ZIP containing 2 files → 2 receipt dicts returned
test_zip_skips_directories ZIP with directory entry (subdir/) → directory skipped, only files returned
test_empty_zip_returns_empty Empty ZIP → []
test_sha256_is_consistent Same bytes with different filenames → same SHA256
test_b64_decodes_to_original Base64 field round-trips back to original bytes

Why this matters: date_from_name is used as a fallback when OCR cannot find a date in the receipt text. SHA256 is the key for byte-exact deduplication. If either is wrong, expenses land on the wrong date or duplicates slip through.


5. TestTextToHtml (5 tests — skipped without Odoo env)

What it tests: ab_ai_mail._text_to_html(text) — plain text → safe HTML conversion

Bot replies are plain text from the agent service. _text_to_html converts them to safe HTML for display in Odoo Discuss.

Test What it verifies
test_plain_text_unchanged Normal text passes through
test_newline_becomes_br \n<br> so multi-line replies render correctly
test_html_special_chars_escaped <script>&lt;script&gt; (XSS prevention)
test_ampersand_escaped &&amp;
test_empty_string Empty input → empty output

Why this matters: If escaping is broken, a vendor name containing < or & (common in OCR output) would break the Discuss message rendering or create an XSS vector.


Adding New Tests

When modifying the dedup algorithm, add a test in TestFindSemanticDuplicate that reproduces the real-world case that motivated the change. Name it after the vendor or scenario (e.g. test_pass2_catches_ocr_amount_mismatch was named after the In-N-Out bug).

When adding a new confirmation keyword (e.g. "yes please"), add a corresponding test_plan_* case.

When changing the _act() flow, add or update a test_act_* case that verifies the new state transition with mocked Odoo tools.