- Defer Takeout sidecar enrichment until after indexing so its UPDATE statements actually match rows. Previously it ran first and silently no-op'd on the very first scan because no files existed in the DB yet. - Preserve user review decisions across incremental and regroup rescans. The grouping phase wipes duplicate_groups/duplicate_members, which also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups by (method, frozenset of member file_ids) before the wipe and re-applies them to any post-regrouping group whose member set is unchanged. - Replace 2-hex-char phash bucketing with multi-index pigeonhole (16 nibble buckets per hash). At threshold=10, the previous bucketing missed any near-duplicate pair that differed in the first byte, since they landed in different buckets and were never compared. Caches imagehash.hex_to_hash() per phash and dedups pair comparisons. - Rewrite _suggested_keeper_by_resolution: previous implementation had a dead inner score() function and the lambda was missing the date tie-breaker (left as a TODO comment). Now picks largest pixels, ties by file size, then by oldest exif_datetime. - Filter phash candidates to length(phash)=16 to skip malformed hashes rather than relying on the silent except in the comparison loop. - Reject /api/scan/reset while a scan is running. Resetting mid-scan wiped tables the running scan thread was still writing to. - Also clears stale 'redundant' file status (not just 'keeper') when a file no longer appears in any group after regrouping.
22 KiB
22 KiB