Two sources of hallucinated values in receipt parsing: 1. The LLM extraction prompt had no explicit "don't guess" constraint, so when Tesseract produced garbled OCR text the LLM substituted plausible- looking values (wrong vendor names, wrong totals) instead of returning safe defaults. 2. The date field asked the LLM to extract the date from the OCR text even when date_hint (from the filename timestamp, e.g. 20260509_180857.jpg) was already available — a reliable signal that was being ignored. expenses_agent._parse_receipt_text: - LLM path: new prompt leads with "copy values EXACTLY, do NOT guess or infer"; adds "if OCR looks corrupted, return safe default rather than a more logical value"; injects date_hint directly as an authoritative value when available so the LLM never needs to extract the date. - Vision fast path: normalise "null" string for date the same way as time; prefer date_hint over a null date returned by the vision model. receipt_parser._ocr_image_vision: - Vision prompt now leads with the same "copy exactly, do not guess" constraint and explicitly accepts null for date/time when not clearly visible, matching the conservative tone of the LLM extraction prompt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.8 KiB
9.8 KiB