Fix correctness bugs in scanner and reset endpoint
- Defer Takeout sidecar enrichment until after indexing so its UPDATE statements actually match rows. Previously it ran first and silently no-op'd on the very first scan because no files existed in the DB yet. - Preserve user review decisions across incremental and regroup rescans. The grouping phase wipes duplicate_groups/duplicate_members, which also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups by (method, frozenset of member file_ids) before the wipe and re-applies them to any post-regrouping group whose member set is unchanged. - Replace 2-hex-char phash bucketing with multi-index pigeonhole (16 nibble buckets per hash). At threshold=10, the previous bucketing missed any near-duplicate pair that differed in the first byte, since they landed in different buckets and were never compared. Caches imagehash.hex_to_hash() per phash and dedups pair comparisons. - Rewrite _suggested_keeper_by_resolution: previous implementation had a dead inner score() function and the lambda was missing the date tie-breaker (left as a TODO comment). Now picks largest pixels, ties by file size, then by oldest exif_datetime. - Filter phash candidates to length(phash)=16 to skip malformed hashes rather than relying on the silent except in the comparison loop. - Reject /api/scan/reset while a scan is running. Resetting mid-scan wiped tables the running scan thread was still writing to. - Also clears stale 'redundant' file status (not just 'keeper') when a file no longer appears in any group after regrouping.
This commit is contained in:
@@ -223,6 +223,10 @@ def scan_resume():
|
||||
def scan_reset(confirm: str = Query("")):
|
||||
if confirm != "RESET":
|
||||
raise HTTPException(400, "Pass ?confirm=RESET to confirm")
|
||||
if sc.scan_state["status"] == "running":
|
||||
raise HTTPException(
|
||||
400, "A scan is currently running — pause it before resetting"
|
||||
)
|
||||
con = get_db()
|
||||
cur = con.cursor()
|
||||
cur.execute("DELETE FROM duplicate_members")
|
||||
|
||||
Reference in New Issue
Block a user