docs: label test file and add TEST_EXPENSES_AGENT.md
Adds module-level label and cross-reference to the new doc. TEST_EXPENSES_AGENT.md documents every test group, case, and the real-world bug each test guards against (e.g. In-N-Out OCR mismatch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
187
tests/TEST_EXPENSES_AGENT.md
Normal file
187
tests/TEST_EXPENSES_AGENT.md
Normal file
@@ -0,0 +1,187 @@
|
|||||||
|
# Test Suite: `test_expenses_agent.py`
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Unit tests for the receipt-to-expense-report pipeline in the ActiveBlue AI agent service.
|
||||||
|
All tests run without a live Odoo instance, database, or LLM — external dependencies are
|
||||||
|
mocked so the suite completes in under one second.
|
||||||
|
|
||||||
|
**File:** `tests/test_expenses_agent.py`
|
||||||
|
**Modules under test:**
|
||||||
|
- `agent_service/agents/expenses_agent.py`
|
||||||
|
- `agent_service/tools/receipt_parser.py`
|
||||||
|
- `addons/activeblue_ai/models/ab_ai_mail.py`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running the Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /root/odoo/odoo-ai
|
||||||
|
|
||||||
|
# First time only — create the test venv
|
||||||
|
python3 -m venv .venv-test
|
||||||
|
source .venv-test/bin/activate
|
||||||
|
pip install -r requirements-test.txt
|
||||||
|
|
||||||
|
# Every subsequent run
|
||||||
|
source .venv-test/bin/activate
|
||||||
|
python -m pytest tests/test_expenses_agent.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output: **41 passed, 5 skipped** (the 5 skipped tests require the Odoo
|
||||||
|
environment to import `ab_ai_mail.py`; they are skipped cleanly on the agent service host).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Groups
|
||||||
|
|
||||||
|
### 1. `TestFindSemanticDuplicate` (18 tests)
|
||||||
|
|
||||||
|
**What it tests:** `ExpensesAgent._find_semantic_duplicate(parsed, candidates)`
|
||||||
|
|
||||||
|
This is the core deduplication algorithm that prevents the same physical receipt from being
|
||||||
|
entered as multiple expense records. It runs in two passes:
|
||||||
|
|
||||||
|
| Pass | Trigger condition | Purpose |
|
||||||
|
|------|-------------------|---------|
|
||||||
|
| 1 | Same date + amount within $0.05 + vendor similarity ≥ 60% | Standard duplicate (two photos of the same receipt) |
|
||||||
|
| 2 | Same date + vendor similarity ≥ 80% (amount may differ) | OCR amount misread (e.g. `$8.55` parsed as `$15.00`) |
|
||||||
|
|
||||||
|
**Cases covered:**
|
||||||
|
|
||||||
|
| Test | What it verifies |
|
||||||
|
|------|-----------------|
|
||||||
|
| `test_exact_match` | Identical parsed fields → flagged as duplicate |
|
||||||
|
| `test_amount_within_threshold` | Amounts within $0.05 → duplicate |
|
||||||
|
| `test_amount_just_over_threshold` | Amounts $0.06+ apart → Pass 1 misses, Pass 2 catches via vendor+date |
|
||||||
|
| `test_different_date_not_duplicate` | Same vendor/amount, different date → not a dup |
|
||||||
|
| `test_zero_amount_not_deduplicated` | Zero-amount receipts are too ambiguous to dedup |
|
||||||
|
| `test_vendor_similarity_above_threshold` | "IN-N-OUT HOUSTON" vs "In-N-Out Houston" → duplicate |
|
||||||
|
| `test_vendor_similarity_below_threshold_pass1` | Unrelated vendors, same amount+date → not a dup |
|
||||||
|
| `test_time_within_window_is_dup` | Transaction times within 30 min → duplicate |
|
||||||
|
| `test_time_outside_window_not_dup` | Transaction times > 30 min apart → different transactions |
|
||||||
|
| `test_one_time_missing_does_not_exclude` | Only one receipt has a time → time check skipped, other signals decide |
|
||||||
|
| `test_filename_vendor_same_amount_date_is_dup` | Vendor is raw filename → same amount+date sufficient |
|
||||||
|
| `test_no_candidates` | Empty candidate list → returns None |
|
||||||
|
| `test_returns_correct_index_multiple_candidates` | Correct index returned when multiple candidates present |
|
||||||
|
| `test_pass2_catches_ocr_amount_mismatch` | **The In-N-Out bug** — `$8.55` vs `$15.00`, same vendor+date → caught by Pass 2 |
|
||||||
|
| `test_pass2_requires_high_vendor_similarity` | Clearly different vendors (Starbucks vs McDonalds) → not flagged |
|
||||||
|
| `test_pass2_same_date_required` | Different dates, high vendor similarity → not a dup |
|
||||||
|
| `test_pass2_respects_time_window` | High vendor similarity but > 30 min apart → different visit |
|
||||||
|
| `test_pass2_skips_filename_vendors` | Filename-looking vendors excluded from Pass 2 |
|
||||||
|
| `test_pass2_zero_amount_not_deduplicated` | Zero-amount edge case in Pass 2 |
|
||||||
|
|
||||||
|
**Why this matters:** In production, a user uploaded two photos of the same In-N-Out receipt.
|
||||||
|
OCR misread one total as `$15.00` (correct: `$8.55`). Pass 1 missed it because the amounts
|
||||||
|
differed by $6.45. Pass 2 was added specifically to catch this: if the vendor name and date
|
||||||
|
match closely, the receipts are flagged as a likely OCR error regardless of the amount.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. `test_plan_*` (11 tests)
|
||||||
|
|
||||||
|
**What it tests:** `ExpensesAgent._plan()` — keyword detection from the user's raw message
|
||||||
|
|
||||||
|
The `_plan()` method inspects both the LLM-rewritten `task` (intent summary) and the
|
||||||
|
`raw_message` (original user text) to determine:
|
||||||
|
- `user_confirmed` — did the user approve the parsed receipt list?
|
||||||
|
- `user_dup_decision` — did the user say to skip or keep duplicates?
|
||||||
|
- `mode` — `create_from_receipts` vs `read`
|
||||||
|
|
||||||
|
| Test | What it verifies |
|
||||||
|
|------|-----------------|
|
||||||
|
| `test_plan_confirm_keyword_sets_confirmed` | "confirm" → `user_confirmed=True` |
|
||||||
|
| `test_plan_looks_good_sets_confirmed` | "looks good" → `user_confirmed=True` |
|
||||||
|
| `test_plan_go_ahead_sets_confirmed` | "go ahead" → `user_confirmed=True` |
|
||||||
|
| `test_plan_no_keyword_not_confirmed` | "create an expense report" → `user_confirmed=False` |
|
||||||
|
| `test_plan_keep_all_sets_dup_decision` | "confirm, keep all" → `user_confirmed=True`, `user_dup_decision=keep_all` |
|
||||||
|
| `test_plan_skip_sets_dup_decision` | "skip duplicates" → `user_dup_decision=skip` |
|
||||||
|
| `test_plan_default_dup_decision_is_skip` | "confirm" with no dup instruction → default `skip` |
|
||||||
|
| `test_plan_mode_is_read_without_receipts` | No receipts attached → `mode=read` |
|
||||||
|
| `test_plan_mode_is_create_with_receipts` | Receipts present → `mode=create_from_receipts` |
|
||||||
|
| `test_plan_task_field_also_checked` | Confirm keyword in LLM-rewritten `task` field also works |
|
||||||
|
|
||||||
|
**Why this matters:** The MasterAgent rewrites the user's message as an `intent_summary`
|
||||||
|
before passing it to the expenses agent. Short replies like "confirm" or "skip" can be lost
|
||||||
|
in the rewrite. The agent checks both fields so these keywords are never missed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. `test_act_*` (4 tests)
|
||||||
|
|
||||||
|
**What it tests:** `ExpensesAgent._act()` — the creation pipeline and confirmation gate
|
||||||
|
|
||||||
|
`_act()` is the most complex method. It orchestrates byte dedup → concurrent LLM parse →
|
||||||
|
semantic dedup → confirmation gate → Odoo write. These tests mock the Odoo tools
|
||||||
|
(`ExpensesTools`) and the LLM parse to control inputs precisely.
|
||||||
|
|
||||||
|
| Test | What it verifies |
|
||||||
|
|------|-----------------|
|
||||||
|
| `test_act_enters_awaiting_confirmation_on_first_pass` | First call (not confirmed) → returns `[]`, sets mode to `awaiting_confirmation`, populates `_confirmation_items` |
|
||||||
|
| `test_act_creates_sheet_when_confirmed` | Second call (confirmed) → calls `create_expense_sheet` and `create_expense`, returns action strings |
|
||||||
|
| `test_act_deduplicates_byte_identical_receipts` | Two receipts with the same SHA256 → only one `create_expense` call |
|
||||||
|
| `test_act_no_employee_returns_empty_and_escalates` | No employee record found → returns `[]`, adds escalation message |
|
||||||
|
|
||||||
|
**Why this matters:** The confirmation gate is the last human checkpoint before any data is
|
||||||
|
written to Odoo. These tests verify that: (a) no records are written on the first pass,
|
||||||
|
(b) records are written exactly once on confirmation, and (c) byte-identical files are never
|
||||||
|
double-entered regardless of the dup decision.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. `TestParseUpload` (8 tests)
|
||||||
|
|
||||||
|
**What it tests:** `receipt_parser.parse_upload(filename, data)` — file parsing and metadata extraction
|
||||||
|
|
||||||
|
`parse_upload` is the entry point for all receipt data. It handles file type detection,
|
||||||
|
ZIP extraction, OCR invocation, and metadata harvesting.
|
||||||
|
|
||||||
|
| Test | What it verifies |
|
||||||
|
|------|-----------------|
|
||||||
|
| `test_text_file_parsed` | Plain `.txt` file → correct text, mimetype, sha256 |
|
||||||
|
| `test_date_extracted_from_filename` | `20260509_180857.jpg` → `date_from_name=2026-05-09` |
|
||||||
|
| `test_no_date_in_plain_filename` | `receipt.txt` → `date_from_name=None` |
|
||||||
|
| `test_zip_extracted` | ZIP containing 2 files → 2 receipt dicts returned |
|
||||||
|
| `test_zip_skips_directories` | ZIP with directory entry (`subdir/`) → directory skipped, only files returned |
|
||||||
|
| `test_empty_zip_returns_empty` | Empty ZIP → `[]` |
|
||||||
|
| `test_sha256_is_consistent` | Same bytes with different filenames → same SHA256 |
|
||||||
|
| `test_b64_decodes_to_original` | Base64 field round-trips back to original bytes |
|
||||||
|
|
||||||
|
**Why this matters:** `date_from_name` is used as a fallback when OCR cannot find a date in
|
||||||
|
the receipt text. SHA256 is the key for byte-exact deduplication. If either is wrong, expenses
|
||||||
|
land on the wrong date or duplicates slip through.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. `TestTextToHtml` (5 tests — skipped without Odoo env)
|
||||||
|
|
||||||
|
**What it tests:** `ab_ai_mail._text_to_html(text)` — plain text → safe HTML conversion
|
||||||
|
|
||||||
|
Bot replies are plain text from the agent service. `_text_to_html` converts them to safe
|
||||||
|
HTML for display in Odoo Discuss.
|
||||||
|
|
||||||
|
| Test | What it verifies |
|
||||||
|
|------|-----------------|
|
||||||
|
| `test_plain_text_unchanged` | Normal text passes through |
|
||||||
|
| `test_newline_becomes_br` | `\n` → `<br>` so multi-line replies render correctly |
|
||||||
|
| `test_html_special_chars_escaped` | `<script>` → `<script>` (XSS prevention) |
|
||||||
|
| `test_ampersand_escaped` | `&` → `&` |
|
||||||
|
| `test_empty_string` | Empty input → empty output |
|
||||||
|
|
||||||
|
**Why this matters:** If escaping is broken, a vendor name containing `<` or `&` (common in
|
||||||
|
OCR output) would break the Discuss message rendering or create an XSS vector.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding New Tests
|
||||||
|
|
||||||
|
When modifying the dedup algorithm, add a test in `TestFindSemanticDuplicate` that
|
||||||
|
reproduces the real-world case that motivated the change. Name it after the vendor or
|
||||||
|
scenario (e.g. `test_pass2_catches_ocr_amount_mismatch` was named after the In-N-Out bug).
|
||||||
|
|
||||||
|
When adding a new confirmation keyword (e.g. "yes please"), add a corresponding
|
||||||
|
`test_plan_*` case.
|
||||||
|
|
||||||
|
When changing the `_act()` flow, add or update a `test_act_*` case that verifies the
|
||||||
|
new state transition with mocked Odoo tools.
|
||||||
@@ -1,12 +1,31 @@
|
|||||||
"""
|
"""
|
||||||
Unit tests for ExpensesAgent logic.
|
ActiveBlue AI — Expenses Agent Unit Tests
|
||||||
|
==========================================
|
||||||
|
Suite: test_expenses_agent.py
|
||||||
|
Module: agent_service/agents/expenses_agent.py
|
||||||
|
agent_service/tools/receipt_parser.py
|
||||||
|
addons/activeblue_ai/models/ab_ai_mail.py
|
||||||
|
|
||||||
Covers:
|
Purpose
|
||||||
- _find_semantic_duplicate (two-pass dedup algorithm)
|
-------
|
||||||
- _plan() (keyword detection → user_confirmed, user_dup_decision)
|
Verify the core business logic of the expenses agent without requiring
|
||||||
- _act() confirmation gate (enters awaiting_confirmation before writing records)
|
a live Odoo instance, database, or LLM. All external dependencies
|
||||||
- parse_upload (ZIP extraction, filename date parsing)
|
(ORM, HTTP, Ollama) are mocked. Tests run in < 1 second.
|
||||||
- _text_to_html (HTML escaping and newline conversion)
|
|
||||||
|
Run
|
||||||
|
---
|
||||||
|
source .venv-test/bin/activate
|
||||||
|
python -m pytest tests/test_expenses_agent.py -v
|
||||||
|
|
||||||
|
Test groups
|
||||||
|
-----------
|
||||||
|
TestFindSemanticDuplicate — two-pass duplicate-detection algorithm
|
||||||
|
test_plan_* — intent keyword → user_confirmed / user_dup_decision
|
||||||
|
test_act_* — _act() confirmation gate and expense creation
|
||||||
|
TestParseUpload — receipt_parser ZIP handling and metadata
|
||||||
|
TestTextToHtml — HTML escaping (skipped without Odoo env)
|
||||||
|
|
||||||
|
See tests/TEST_EXPENSES_AGENT.md for full documentation.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user