Commit Graph

107 Commits

Author SHA1 Message Date
Carlos Garcia
e6c3d08990 Fix receipt parsing quality and approval endpoint
Receipt quality: replace LLM amount/date extraction with regex.
LLM was hallucinating 2021/2022 dates and returning '198.40 USD' strings.
Amounts now use deterministic regex (Total:/Grand Total:/Amount Due:).
Dates: filename timestamp > OCR regex > today (no LLM date guessing).
LLM only asked for vendor name + product category.

Approval: fix GET /approval/pending 500 by using correct column
name 'started_at' instead of 'created_at' (which does not exist).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 23:02:11 -04:00
Carlos Garcia
0320591344 Remove vision OCR — use Tesseract-only pipeline for receipt parsing
The llama3.2-vision model was producing unreliable structured data
(wrong vendors, amounts, dates) making expense reports worse than
Tesseract + LLM extraction.  Removes _ocr_image_vision(), the
vision JSON fast path in _parse_receipt_text(), _match_category(),
and the vision_ocr_model config setting entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:32:26 -04:00
Carlos Garcia
ec6b41943f fix: vision OCR JSON failures — add format='json' and repair fallback
Three receipts per batch were failing with JSONDecodeError (e.g.
"Expecting ':' delimiter: line 1 column 90") because activeblue-chat
(llama3.2-vision) occasionally outputs near-JSON with trailing commas,
single-quoted strings, or unquoted keys.

Two-layer fix:
1. Add format='json' to the Ollama chat call — Ollama JSON mode forces
   syntactically valid output at the sampler level, eliminating most
   structural errors.
2. Add _repair_json() fallback that runs on any remaining JSONDecodeError:
   strips trailing commas, converts single→double quotes, and quotes
   unquoted keys. If repair succeeds, the result is re-serialised as
   canonical JSON before being returned.

Also re-serialise with json.dumps() on success so the fast path in
_parse_receipt_text always receives clean, canonical JSON regardless of
whitespace or key ordering in the model's original output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:24:50 -04:00
Carlos Garcia
9fa391c720 fix: reduce hallucination in receipt extraction — conservative prompts + date injection
Two sources of hallucinated values in receipt parsing:

1. The LLM extraction prompt had no explicit "don't guess" constraint, so
   when Tesseract produced garbled OCR text the LLM substituted plausible-
   looking values (wrong vendor names, wrong totals) instead of returning
   safe defaults.

2. The date field asked the LLM to extract the date from the OCR text even
   when date_hint (from the filename timestamp, e.g. 20260509_180857.jpg)
   was already available — a reliable signal that was being ignored.

expenses_agent._parse_receipt_text:
- LLM path: new prompt leads with "copy values EXACTLY, do NOT guess or
  infer"; adds "if OCR looks corrupted, return safe default rather than
  a more logical value"; injects date_hint directly as an authoritative
  value when available so the LLM never needs to extract the date.
- Vision fast path: normalise "null" string for date the same way as time;
  prefer date_hint over a null date returned by the vision model.

receipt_parser._ocr_image_vision:
- Vision prompt now leads with the same "copy exactly, do not guess"
  constraint and explicitly accepts null for date/time when not clearly
  visible, matching the conservative tone of the LLM extraction prompt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:19:20 -04:00
Carlos Garcia
cc025695ac fix: prevent master agent asking for clarification when receipts are uploaded
When a zip/image arrives via /upload, the LLM was classifying the
message as needs_clarification=True (because the chat body was just a
filename like "download (8).zip", not an instruction), and the early
return on line 91 fired before the receipts safety guard on line 106,
so the guard never executed.

master_agent: move the receipts safety guard to BEFORE the
needs_clarification early-return.  If extra_context contains receipts,
unconditionally set needs_clarification=False and ensure expenses_agent
is in the agents list — the LLM cannot veto an upload with a question.

upload router: normalize empty or filename-only messages (e.g. when the
user drops a file in Discuss chat with no text) to
"Create an expense report from these uploaded receipts." so the LLM
intent classification also has a sensible string to work with.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:13:46 -04:00
Carlos Garcia
68b7b3f0f3 fix: add missing approval workflow columns to ab_directive_log (migration 002)
/approval/pending was returning 500 UndefinedColumnError because the
approval router and MCP get_pending_approvals tool both query columns
(agent_name, action_type, description, context_data, approver_id,
approval_note, updated_at) that were never added in the initial schema
migration 001.

Adds migration 002 to ALTER TABLE ab_directive_log with all seven
missing columns (all nullable so existing rows are unaffected) and an
index on updated_at for efficient polling.

Deploy: after pulling on miaai, run:
  cd /root/odoo/odoo-ai && docker compose exec agent-service \
    alembic -c agent_service/migrations/alembic.ini upgrade head

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:03:59 -04:00
Carlos Garcia
70145c9e04 fix: chat attachment detection — 3-method fallback + deferred retry
ab_ai_mail.py: when a user sends a file via Odoo 18 Discuss, the zip
was going through /dispatch (text-only) instead of /upload, causing the
bot to respond "I'm unable to locate the zip file" because attachment_ids
was empty in the message_post override.

Root cause: Odoo 18 Discuss links file attachments to mail.message
records via three different mechanisms depending on the upload path, and
we only checked one (the Many2many relation table).

Fixes:
1. Three-method attachment detection in message_post:
   - Method 1: result.attachment_ids (Many2many relation table)
   - Method 2: ir.attachment with res_model='mail.message' (Odoo 15+ style)
   - Method 3: attachment IDs parsed from href URLs in the HTML body
2. Deferred retry in _agent_thread: if att_data is still empty but a
   message_id is known, sleep 1s then re-read via a fresh DB cursor so
   we see data committed after message_post returned (timing race fix)
3. Skip zero-byte attachments and warn instead of silently using them
4. Pass message_id to the background thread (new kwarg, backward compat)
5. Add debug logging so future issues can be diagnosed from Odoo logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 22:01:38 -04:00
Carlos Garcia
11cc261923 fix: vision OCR receipt extraction — skip second LLM call, fix total truncation
receipt_parser: change _ocr_image_vision() to extract structured JSON
{vendor,amount,date,time,category} directly from the image instead of
transcribing raw text, so the downstream LLM extraction step is
unnecessary and the two-step error-compounding is eliminated.

expenses_agent: add _match_category() helper to map vision category
labels to expense product names via substring/fuzzy match; add fast
path in _parse_receipt_text() that detects pre-extracted vision JSON
(text starts with '{') and skips the second LLM submit call entirely.
Fix text[:2000] truncation that discarded receipt totals — now keeps
first 1500 + last 1500 chars of long receipts so the grand total at
the bottom is always included.

tests: fix stale test_act_enters_awaiting_confirmation_on_first_pass
(confirmation gate was removed); add TestMatchCategory and three new
tests for the vision JSON fast path and LLM fallthrough.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 21:49:31 -04:00
Carlos Garcia
7a0aad3f37 fix: three bugs blocking bot presence and approval UI
1. OdooClient missing self._timeout — every _xmlrpc_call raised
   AttributeError, making the odoo health check permanently fail.
   Fix: set self._timeout = XMLRPC_TIMEOUT in __init__.

2. action_ping only accepted ollama=='ok' but health.py now returns
   'warming' when the model is not yet hot in VRAM. Fix: treat
   warming as passing so the bot goes online and the model loads
   on the first real request.

3. /ai/approval/pending declared methods=['GET'] on a type='json'
   route — Odoo JSON-RPC always POSTs, so every browser call got
   405 METHOD NOT ALLOWED. Fix: change to methods=['POST'].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 20:53:49 -04:00
Carlos Garcia
b23ab77ee9 fix: bot presence stays offline after vision model change
ping() was calling ollama.AsyncClient.list() which parses /api/tags with
ollama==0.3.3 pydantic models. Vision models carry metadata fields that 0.3.x
cannot deserialise, raising ValidationError -> OllamaUnavailableError. This
made the /health/detailed ollama field 'error: ...' instead of 'ok', so
ab_ai_bot.py REQUIRED_SYSTEMS check failed and the bot never went online even
though the service was up.

Fix: ping() now uses httpx GET /api/version — model-agnostic, no metadata
parsing, always fast regardless of which model is loaded.

Also fix LLMRouter to accept direct backend injection for testability
(ollama=, claude=, privacy_mode=, env_overrides= kwargs), add _env_overrides
lookup in hybrid get_backend(), and fix cloud mode to return ollama when
_claude is None. All 6 test_llm_router tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 19:15:49 -04:00
2f9791f925 Update CLAUDE.md 2026-05-20 21:26:34 +00:00
e67dc06a22 CLAUDE.md 2026-05-20 21:21:45 +00:00
564f1a9479 fix: raise Ollama timeout to 300s, add model pre-warming, improve health check
- OllamaBackend enforces _MIN_TIMEOUT=300s (overrides OLLAMA_TIMEOUT env var)
- warm_model() background task loads activeblue-chat into VRAM at startup
- health/detailed reports "warming" vs "ok" via Ollama ps() API
- README updated with May 2026 changes and test coverage details
2026-05-20 05:03:15 +00:00
20a69313d7 Add comprehensive unit tests for all agent service components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 04:00:45 +00:00
6c22a9a128 feat: elearning_agent — reduce tools 14 → 8 so it registers at startup
- Merge get_course_stats + get_enrolled_users + get_slide_completion → get_course_details
- Fold publish_course into update_course via website_published param
- Drop flag_low_completion (replaced by post_chatter_note) and suggest_next_course
  (still callable internally via peer-bus suggest_courses request)
- elearning_tools: add get_course_details(), extend update_course() signature
- ARCHITECTURE.md: mark elearning_agent as registered

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:02:51 -04:00
233f461480 fix: align peer_bus signature, bot presence SQL, XML-RPC timeout
- All specialist agents: handle_peer_request(request_type, params, directive_id)
  replaces handle_peer_request(request: dict) so callers pass structured args
- ab_ai_bot: force-write bus_presence.status via SQL so Odoo 18 WebSocket presence
  shows the correct colour immediately (ORM compute does not trigger on last_poll writes)
- odoo_client: wrap XML-RPC executor calls in asyncio.wait_for to enforce timeout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:02:51 -04:00
Carlos Garcia
93f2a101fa refactor: remove scripted file intercept — LLM owns all responses
Previously ab_ai_mail.py intercepted file uploads before reaching the
LLM and responded with a hardcoded clarification template. The LLM had
no involvement in the file upload response.

Changes:
- ab_ai_mail.py: remove _post_file_clarification, _find_pending_attachments,
  _describe_zip, and the two-step pending-attachment lookup. All messages
  (text, files, or both) are dispatched to the agent service immediately.
  Files with no text pass an empty message — the LLM decides what to do.
- upload.py: default message changed from hardcoded receipt instruction
  to '' so the LLM determines intent from file content.
- master_agent._synthesize: always runs through the LLM for both single
  and multi-agent cases — no raw templates reach the user.
- master_system.txt: add FILE UPLOADS routing rule so the LLM knows to
  route receipts to expenses_agent without asking for clarification.

New flow: upload → parse → LLM classifies → agent acts → LLM synthesizes
natural response → user sees it. Zero scripted intercepts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 21:05:38 -04:00
Carlos Garcia
0bd1810405 fix: create expense report immediately — remove broken confirmation gate
The old flow required a "confirm" reply after showing a parsed-receipt
table, but that follow-up dispatch call carries no receipts (they only
exist in the /upload context). The confirmation gate was architecturally
broken: the second turn would always create nothing.

Fix: create the expense sheet immediately when receipts are present.
Byte-exact and semantic duplicates are auto-skipped; the count of
skipped items is reported in the success message. The report is always
created in Odoo as a draft so users can review amounts and submit
manually via Odoo > Expenses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 20:58:47 -04:00
Carlos Garcia
8d1727b498 feat: sysops_agent — Docker/git self-management with auto-heal
Adds a new specialist agent that gives the AI system control over its
own infrastructure:

- sysops_tools.py: docker SDK (ps/logs/restart) + git CLI (pull/status/log)
  + Odoo channel notifier for autonomous action broadcasts
- sysops_agent.py: BaseAgent subclass handling on-demand chat requests,
  auto_heal() triggered by health failures, and sweep() for audits
- Background auto-heal loop (main.py): runs every 2 minutes, calls
  _get_failing_systems() and triggers auto_heal() when degraded
- health.py: extracted _get_failing_systems() helper reused by both
  the /health/detailed endpoint and the auto-heal loop
- docker-compose.yml: mount docker socket + /root/odoo workspace +
  SSH keys for git authentication
- Dockerfile: add git to apt-get
- requirements.txt: add docker==7.1.0 Python SDK

Auto-heal behavior:
  - Detects failing containers, restarts them, notifies all bot DM channels
  - Ollama (192.168.2.9) is flagged as external and skipped
  - On-demand via chat: "restart agent", "check logs", "pull latest code"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 17:01:57 -04:00
Carlos Garcia
f4991dd920 fix: presence window 24h → 10min to match cron heartbeat
Bot green dot stays on for 10 minutes after each successful health
check (2× the ~5-min cron cycle). A failed check sets last_poll to
1 hour in the past, going offline immediately. If the cron stops
entirely, the dot goes offline on its own after 10 minutes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:36:48 -04:00
Carlos Garcia
160f96a549 fix: override bus.presence._compute_status so bot shows online
Odoo 18's _compute_status treats future last_poll as MORE disconnected
(absolute delta). Override forces status='online' when last_poll > now,
which is set 24h ahead by _sync_bot_user_presence when the health check
passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:27:40 -04:00
Carlos Garcia
eeea45b37f fix: explicit per-system health checks gate online status
action_ping now checks db, odoo, ollama, and master_agent individually.
All four must report 'ok' for the bot to go online. Presence is updated
immediately inside action_ping (not as a separate cron step), so every
ping — whether from the cron or a manual button press — atomically checks
all systems and sets the correct online/offline/error state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:07:59 -04:00
Carlos Garcia
99cc19195a fix: keep bot presence online for 24h instead of racing the 30s timer
Set last_poll and last_presence 24h ahead when the service is confirmed
online, so status stays 'online' until the cron explicitly marks it down.
The previous 10min offset still expired between cron runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:06:08 -04:00
Carlos Garcia
a0fc1396a9 fix: Odoo 18 field errors, routing quality, bot presence, and add architecture docs
- expenses_tools: remove 'date' from hr.expense.sheet field lists (Odoo 18
  uses accounting_date; querying 'date' raised ValueError at runtime)
- master_system.txt: add few-shot routing examples so Llama 3.1 8B correctly
  outputs agents=[] for general questions instead of defaulting to expenses_agent
- ab_ai_bot.py: increase bot presence last_poll offset from 90s to 10min so
  the green dot stays on between cron runs (cron fires every ~5min in practice,
  not every 20s as configured)
- ARCHITECTURE.md: full system documentation covering component layout, request
  flow, LLM routing, agent registry, access control, health/presence mechanism,
  known issues fixed today, and future self-healing concept

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 15:47:48 -04:00
Carlos Garcia
b76d01b64f Fix vision OCR response parsing for dict-returning ollama client versions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 11:59:11 -04:00
Carlos Garcia
5b924e60de Add vision OCR via Ollama vision model with Tesseract fallback
Introduces VISION_OCR_MODEL setting. When set (e.g. llama3.2-vision:11b),
receipt images are transcribed by the Ollama vision model before falling
back to Tesseract. Also improves Tesseract preprocessing with adaptive
binarisation (autocontrast + threshold at 140) for better accuracy on
thermal receipts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:43:21 -04:00
Carlos Garcia
9f38fb013c docs: label test file and add TEST_EXPENSES_AGENT.md
Adds module-level label and cross-reference to the new doc.
TEST_EXPENSES_AGENT.md documents every test group, case, and the
real-world bug each test guards against (e.g. In-N-Out OCR mismatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:35:07 -04:00
Carlos Garcia
469025b6f2 test: fix bad vendor example in pass2 similarity test
'Restaurant A' vs 'Restaurant Z' differ by 1 char so difflib scores
them at ~91% -- correctly above the 80% threshold. Use clearly
different vendors (Starbucks Coffee vs McDonalds Burger) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:32:38 -04:00
Carlos Garcia
1c5f6e7ca3 test: fix _ext import (only exists in ab_ai_mail, not receipt_parser)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:31:30 -04:00
Carlos Garcia
92ba6bd069 test: add requirements-test.txt for isolated test venv
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:14:05 -04:00
Carlos Garcia
6fcd830e6f test: unit tests for expenses agent dedup, plan, act, and receipt parser
- TestFindSemanticDuplicate: 18 cases covering Pass1 (amount match),
  Pass2 (OCR mismatch / high vendor similarity), time window, filenames,
  zero-amount exclusion, multi-candidate index correctness
- test_plan_*: keyword detection for confirm/skip/keep-all, mode routing
- test_act_*: confirmation gate, byte-dedup, no-employee escalation,
  confirmed creation with mocked Odoo tools
- TestParseUpload: ZIP extraction, directory skipping, filename date
  parsing, SHA256 consistency, b64 round-trip
- TestTextToHtml: escaping, newline to <br>, empty string

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 18:11:32 -04:00
Carlos Garcia
af1d27be89 feat: pre-creation confirmation step with inline duplicate warnings
Before writing any expense records the bot now posts a numbered table
of parsed vendor/amount/date for every receipt, with duplicate entries
flagged inline. User replies 'confirm' (skips dups) or 'confirm, keep
all'. This catches OCR amount misreads before they land in Odoo.

Also removes the separate awaiting_dup_approval step; duplicate review
is now part of the single confirmation table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:54:25 -04:00
Carlos Garcia
12576ead1b feat: two-pass dedup catches same-vendor OCR amount misreads
Pass 1 unchanged: same date + amount within 0.05 + vendor similarity 60%.
Pass 2 (new): same vendor (>= 80% similarity) + same date, regardless
of amount, to catch receipts where OCR misread the total.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:48:51 -04:00
Carlos Garcia
774c0cc062 fix: tighten receipt amount extraction prompt to reduce OCR misreads
Replaced 'pick the largest one' guidance with 'bottom-most total' and
'return 0 if no clear total found' to avoid picking line items or tips.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:47:48 -04:00
Carlos Garcia
bb1e93fabb fix: widen actions_taken to list[Any] and improve bot error replies
DispatchResponse declared actions_taken as list[dict] but agents return
list[str], causing a 422 on every successful upload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:31:45 -04:00
Carlos Garcia
cf3fe5e0a5 fix: await get_all() in registry router and align get_all key names
The /registry/agents endpoint was 500 on every call because
AgentRegistry.get_all() is async but was called without await.
Also aligns get_all() dict keys (name, domain) with what the router reads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 13:38:06 -04:00
Carlos Garcia
d87f3c3e99 Non-blocking agent dispatch: run LLM call in background thread
message_post now returns immediately after collecting attachment data.
The agent HTTP call and reply posting happen in a daemon thread, so
Odoo commits the user's message and the browser confirms receipt right
away -- instead of waiting 10+ seconds for Ollama to respond.

File clarification (no LLM) still posts inline since it's instant.
The background thread opens its own DB cursor to post the bot reply.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 12:27:03 -04:00
Carlos Garcia
7d260ca526 Fix HTML display: use plain text + _text_to_html for all bot messages
All bot messages now built as plain text and converted via _text_to_html()
which escapes content and converts newlines to <br>. This avoids raw HTML
tags appearing literally in Odoo 18 Discuss.

- _describe_zip: returns plain str (no Markup/HTML)
- _post_file_clarification: builds plain text, posts via _text_to_html()
- _find_pending_attachments: strip HTML before phrase matching
- _text_to_html: new helper shared by clarification and agent replies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 12:21:58 -04:00
Carlos Garcia
9e3fe974dc Fix dup approval flow: preserve raw message, force expenses routing, fix HTML rendering
- master_agent: thread raw user message into extra_context and peer_data so
  expenses_agent can check it directly without relying on LLM intent_summary
- master_agent: when receipts are in extra_context always route to expenses_agent,
  so replies like 'skip duplicates' still trigger expense processing
- expenses_agent: _plan() checks peer_data raw_message alongside task so
  skip/keep keywords are detected even when master rewrites the intent
- ab_ai_mail: wrap clarification message HTML in Markup() so Odoo does not
  re-escape the tags; use <br> instead of <br/>
- ab_ai_mail: convert agent plain-text replies newlines to <br> for proper
  line-break rendering in Discuss

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:55:46 -04:00
Carlos Garcia
462f63d11d Add duplicate approval flow with time-based dedup
- expenses_agent: extract transaction time (HH:MM) from OCR receipt text
- expenses_agent: _find_semantic_duplicate uses time to rule out false positives (>30 min apart = different receipts)
- expenses_agent: pause when duplicates found, set mode=awaiting_dup_approval, ask user before creating sheet
- expenses_agent: _report formats approval message listing each dup pair with vendor/amount/date/times/filenames
- ab_ai_mail: _find_pending_attachments recognises dup-approval bot message so ZIP re-attaches on user reply

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 02:07:37 -04:00
Carlos Garcia
f90a2ee863 feat: semantic deduplication of multiple photos of same receipt
After parsing all receipts, identify photos that are different shots of
the same physical receipt by comparing amount + date + vendor similarity
(difflib ratio >= 0.6). When a duplicate is found, keep whichever photo
produced the most OCR text (clearest shot) and report the skipped ones.

Zero-amount receipts (OCR failed entirely) are excluded from semantic
dedup to avoid false positives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:56:30 -04:00
Carlos Garcia
c2d1078d79 fix: improve OCR accuracy for rotated/sideways receipt photos
- Dockerfile: add tesseract-ocr-osd for orientation detection data
- receipt_parser: resize large phone photos to 1800px, convert to
  grayscale, sharpen before OCR; use psm 1 (auto + OSD) so rotated
  receipts are correctly oriented before text extraction
- expenses_agent: tighten amount extraction prompt to pick the FINAL
  total, not subtotal or tax line, reducing misreads like 42.90->409.00

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:51:29 -04:00
Carlos Garcia
8a9d772b8e fix: increase timeout and parallelize receipt processing
- ab_ai_bot: raise requests.post timeout 120s -> 600s so long OCR+LLM
  runs don't silently drop the reply in Discuss
- upload: run parse_upload in ThreadPoolExecutor so tesseract OCR
  doesn't block the FastAPI event loop
- expenses_agent: parse all receipts concurrently with asyncio.gather
  (Ollama semaphore caps parallelism at 2); reduces 13-receipt LLM
  time from ~39s sequential to ~20s parallel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:50:12 -04:00
Carlos Garcia
ef6dad5a81 feat: OCR via tesseract, dedup, category selection for expense receipts
- Dockerfile: install tesseract-ocr so Pillow+pytesseract can OCR receipt images
- operational_store: JSON-serialize raw_data before passing to asyncpg JSONB
- receipt_parser: add SHA256 hash + date extracted from filename timestamps
- expenses_agent: deduplicate receipts by hash before creating expense records
- expenses_agent: fetch all expensable Odoo products, pass list to LLM for
  category selection (Meals, Flights, etc.) per receipt
- expenses_agent: pass date_hint from filename (e.g. 20260509_180857.jpg -> 2026-05-09)
  as fallback when OCR text is unavailable
- expenses_tools: add get_expense_products() to fetch all expensable products

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:40:32 -04:00
Carlos Garcia
6ab9624ec6 fix: harden master agent synthesize/memory, fix expense create fields
- _synthesize: short-circuit on any single-agent report (avoids extra
  Ollama call that can timeout); wrap multi-agent LLM call in try/except
- _update_memory: catch exceptions so DB/memory failures don't kill reply
- _log_directive_start: use 0 instead of NULL for channel_id (NOT NULL col)
- create_expense: drop 'description' field (not valid on hr.expense in Odoo 18)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:37:36 -04:00
Carlos Garcia
261252abdd fix: resolve group XML IDs via ir.model.data in access check
AGENT_ACCESS_GROUPS uses XML IDs (e.g. hr_expense.group_hr_expense_user)
but the check compared them against res.groups.full_name strings which
never matched, denying every user access to all restricted agents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:28:01 -04:00
Carlos Garcia
f9ade69f55 fix: auto-activate registered agents with descriptive capabilities
The master agent was routing expense/receipt requests to finance_agent
instead of expenses_agent because only DB-registered agents appeared
in get_active_agents(). This adds auto-activation of all in-memory
registered agents with precise capability summaries so the LLM picks
the right specialist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:24:26 -04:00
Carlos Garcia
62d5d3f550 fix: force JSON output for Ollama intent classification; fix attachment detection
- ollama_backend: add format='json' for 'master' and receipt_parser
  callers so llama3.1:8b returns valid JSON instead of plain English
- ab_ai_mail: add debug logging to trace attachment_ids from Discuss;
  handle file-only messages and clarification look-back flow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:17:58 -04:00
Carlos Garcia
4b7223a139 feat: file upload + expense report creation from Discuss attachments
- Discuss bot now reads ir.attachment from incoming messages; file-only
  messages no longer silently dropped
- ZIP files are described (contents listed) and bot asks clarifying
  question before acting; user's follow-up reply looks back for pending
  attachments so files don't need to be re-uploaded
- receipt_parser: extracts text from ZIP (recursive), JPG/PNG/etc (OCR),
  PDF (pdfplumber), HTML, TXT
- expenses_agent: full rewrite fixing broken method signatures; adds
  create_expense_sheet / create_expense / attach_receipt flow driven by
  LLM receipt parsing (Ollama, HIPAA-locked)
- master_agent: extra_context threads receipts + user_id into directives
- FastAPI /upload multipart endpoint; registered in main.py
- Odoo /ai/upload controller proxies files to agent service
- ab_ai_bot: dispatch_message_with_files() for multipart uploads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 01:02:24 -04:00
Carlos Garcia
bee8e20580 feat(elearning): add course-building capability to elearning agent
- ElearningTools: add create_course, update_course, publish_course,
  add_section, create_slide, enroll_user write methods using OdooClient
- ElearningAgent: fix all BaseAgent method signatures (_plan/_gather/
  _reason/_act/_report no longer take wrong positional args)
- Replace dead _dispatch_tool pattern with _tool_<name> methods so
  BaseAgent._run_tool() can drive them via LLM tool calls in _loop()
- Add LLM-driven course creation in _reason(): when intent is create,
  _loop() is called with a course-building system prompt and all tools;
  the LLM calls create_course → add_section → create_slide → publish
- Fix handle_peer_request signature to match BaseAgent interface
- Fix AgentReport missing directive_id; fix SweepReport invalid kwargs
- Extend ELEARNING_TOOLS list with all new write-side tools

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 23:49:11 -04:00