Files

Carlos Garcia 9f38fb013c docs: label test file and add TEST_EXPENSES_AGENT.md

Adds module-level label and cross-reference to the new doc.
TEST_EXPENSES_AGENT.md documents every test group, case, and the
real-world bug each test guards against (e.g. In-N-Out OCR mismatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-16 18:35:07 -04:00

9.3 KiB

Raw Blame History

Test Suite: `test_expenses_agent.py`

Overview

Unit tests for the receipt-to-expense-report pipeline in the ActiveBlue AI agent service. All tests run without a live Odoo instance, database, or LLM — external dependencies are mocked so the suite completes in under one second.

File: tests/test_expenses_agent.py
Modules under test:

agent_service/agents/expenses_agent.py
agent_service/tools/receipt_parser.py
addons/activeblue_ai/models/ab_ai_mail.py

Running the Tests

cd /root/odoo/odoo-ai

# First time only — create the test venv
python3 -m venv .venv-test
source .venv-test/bin/activate
pip install -r requirements-test.txt

# Every subsequent run
source .venv-test/bin/activate
python -m pytest tests/test_expenses_agent.py -v

Expected output: 41 passed, 5 skipped (the 5 skipped tests require the Odoo environment to import ab_ai_mail.py; they are skipped cleanly on the agent service host).

Test Groups

1. `TestFindSemanticDuplicate` (18 tests)

What it tests: ExpensesAgent._find_semantic_duplicate(parsed, candidates)

This is the core deduplication algorithm that prevents the same physical receipt from being entered as multiple expense records. It runs in two passes:

Pass	Trigger condition	Purpose
1	Same date + amount within $0.05 + vendor similarity ≥ 60%	Standard duplicate (two photos of the same receipt)
2	Same date + vendor similarity ≥ 80% (amount may differ)	OCR amount misread (e.g. `$8.55` parsed as `$15.00`)

Cases covered:

Test	What it verifies
`test_exact_match`	Identical parsed fields → flagged as duplicate
`test_amount_within_threshold`	Amounts within $0.05 → duplicate
`test_amount_just_over_threshold`	Amounts $0.06+ apart → Pass 1 misses, Pass 2 catches via vendor+date
`test_different_date_not_duplicate`	Same vendor/amount, different date → not a dup
`test_zero_amount_not_deduplicated`	Zero-amount receipts are too ambiguous to dedup
`test_vendor_similarity_above_threshold`	"IN-N-OUT HOUSTON" vs "In-N-Out Houston" → duplicate
`test_vendor_similarity_below_threshold_pass1`	Unrelated vendors, same amount+date → not a dup
`test_time_within_window_is_dup`	Transaction times within 30 min → duplicate
`test_time_outside_window_not_dup`	Transaction times > 30 min apart → different transactions
`test_one_time_missing_does_not_exclude`	Only one receipt has a time → time check skipped, other signals decide
`test_filename_vendor_same_amount_date_is_dup`	Vendor is raw filename → same amount+date sufficient
`test_no_candidates`	Empty candidate list → returns None
`test_returns_correct_index_multiple_candidates`	Correct index returned when multiple candidates present
`test_pass2_catches_ocr_amount_mismatch`	The In-N-Out bug — `$8.55` vs `$15.00`, same vendor+date → caught by Pass 2
`test_pass2_requires_high_vendor_similarity`	Clearly different vendors (Starbucks vs McDonalds) → not flagged
`test_pass2_same_date_required`	Different dates, high vendor similarity → not a dup
`test_pass2_respects_time_window`	High vendor similarity but > 30 min apart → different visit
`test_pass2_skips_filename_vendors`	Filename-looking vendors excluded from Pass 2
`test_pass2_zero_amount_not_deduplicated`	Zero-amount edge case in Pass 2

Why this matters: In production, a user uploaded two photos of the same In-N-Out receipt. OCR misread one total as $15.00 (correct: $8.55). Pass 1 missed it because the amounts differed by $6.45. Pass 2 was added specifically to catch this: if the vendor name and date match closely, the receipts are flagged as a likely OCR error regardless of the amount.

2. `test_plan_*` (11 tests)

What it tests: ExpensesAgent._plan() — keyword detection from the user's raw message

The _plan() method inspects both the LLM-rewritten task (intent summary) and the raw_message (original user text) to determine:

user_confirmed — did the user approve the parsed receipt list?
user_dup_decision — did the user say to skip or keep duplicates?
mode — create_from_receipts vs read

Test	What it verifies
`test_plan_confirm_keyword_sets_confirmed`	"confirm" → `user_confirmed=True`
`test_plan_looks_good_sets_confirmed`	"looks good" → `user_confirmed=True`
`test_plan_go_ahead_sets_confirmed`	"go ahead" → `user_confirmed=True`
`test_plan_no_keyword_not_confirmed`	"create an expense report" → `user_confirmed=False`
`test_plan_keep_all_sets_dup_decision`	"confirm, keep all" → `user_confirmed=True`, `user_dup_decision=keep_all`
`test_plan_skip_sets_dup_decision`	"skip duplicates" → `user_dup_decision=skip`
`test_plan_default_dup_decision_is_skip`	"confirm" with no dup instruction → default `skip`
`test_plan_mode_is_read_without_receipts`	No receipts attached → `mode=read`
`test_plan_mode_is_create_with_receipts`	Receipts present → `mode=create_from_receipts`
`test_plan_task_field_also_checked`	Confirm keyword in LLM-rewritten `task` field also works

Why this matters: The MasterAgent rewrites the user's message as an intent_summary before passing it to the expenses agent. Short replies like "confirm" or "skip" can be lost in the rewrite. The agent checks both fields so these keywords are never missed.

3. `test_act_*` (4 tests)

What it tests: ExpensesAgent._act() — the creation pipeline and confirmation gate

_act() is the most complex method. It orchestrates byte dedup → concurrent LLM parse → semantic dedup → confirmation gate → Odoo write. These tests mock the Odoo tools (ExpensesTools) and the LLM parse to control inputs precisely.

Test	What it verifies
`test_act_enters_awaiting_confirmation_on_first_pass`	First call (not confirmed) → returns `[]`, sets mode to `awaiting_confirmation`, populates `_confirmation_items`
`test_act_creates_sheet_when_confirmed`	Second call (confirmed) → calls `create_expense_sheet` and `create_expense`, returns action strings
`test_act_deduplicates_byte_identical_receipts`	Two receipts with the same SHA256 → only one `create_expense` call
`test_act_no_employee_returns_empty_and_escalates`	No employee record found → returns `[]`, adds escalation message

Why this matters: The confirmation gate is the last human checkpoint before any data is written to Odoo. These tests verify that: (a) no records are written on the first pass, (b) records are written exactly once on confirmation, and (c) byte-identical files are never double-entered regardless of the dup decision.

4. `TestParseUpload` (8 tests)

What it tests: receipt_parser.parse_upload(filename, data) — file parsing and metadata extraction

parse_upload is the entry point for all receipt data. It handles file type detection, ZIP extraction, OCR invocation, and metadata harvesting.

Test	What it verifies
`test_text_file_parsed`	Plain `.txt` file → correct text, mimetype, sha256
`test_date_extracted_from_filename`	`20260509_180857.jpg` → `date_from_name=2026-05-09`
`test_no_date_in_plain_filename`	`receipt.txt` → `date_from_name=None`
`test_zip_extracted`	ZIP containing 2 files → 2 receipt dicts returned
`test_zip_skips_directories`	ZIP with directory entry (`subdir/`) → directory skipped, only files returned
`test_empty_zip_returns_empty`	Empty ZIP → `[]`
`test_sha256_is_consistent`	Same bytes with different filenames → same SHA256
`test_b64_decodes_to_original`	Base64 field round-trips back to original bytes

Why this matters: date_from_name is used as a fallback when OCR cannot find a date in the receipt text. SHA256 is the key for byte-exact deduplication. If either is wrong, expenses land on the wrong date or duplicates slip through.

5. `TestTextToHtml` (5 tests — skipped without Odoo env)

What it tests: ab_ai_mail._text_to_html(text) — plain text → safe HTML conversion

Bot replies are plain text from the agent service. _text_to_html converts them to safe HTML for display in Odoo Discuss.

Test	What it verifies
`test_plain_text_unchanged`	Normal text passes through
`test_newline_becomes_br`	`\n` → `<br>` so multi-line replies render correctly
`test_html_special_chars_escaped`	`<script>` → `<script>` (XSS prevention)
`test_ampersand_escaped`	`&` → `&`
`test_empty_string`	Empty input → empty output

Why this matters: If escaping is broken, a vendor name containing < or & (common in OCR output) would break the Discuss message rendering or create an XSS vector.

Adding New Tests

When modifying the dedup algorithm, add a test in TestFindSemanticDuplicate that reproduces the real-world case that motivated the change. Name it after the vendor or scenario (e.g. test_pass2_catches_ocr_amount_mismatch was named after the In-N-Out bug).

When adding a new confirmation keyword (e.g. "yes please"), add a corresponding test_plan_* case.

When changing the _act() flow, add or update a test_act_* case that verifies the new state transition with mocked Odoo tools.

9.3 KiB Raw Blame History

Test Suite: test_expenses_agent.py

Overview

Running the Tests

Test Groups

1. TestFindSemanticDuplicate (18 tests)

2. test_plan_* (11 tests)

3. test_act_* (4 tests)

4. TestParseUpload (8 tests)

5. TestTextToHtml (5 tests — skipped without Odoo env)