Adds a path-penalty score that downranks files in folders named Trashed,
Dups, Backup, Copy, Old, Archive, plus a penalty for repeated path segments
(e.g. Desktop/Desktop/Files) and very deep paths. Also captures and uses
file mtime as a tiebreaker — older files are usually the originals.
Applied to all four detection passes (sha256, phash, exif, filesize+dim)
and to auto-resolve-exact.
New file_mtime column with idempotent migration.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures every review action (keeper, redundant, skip, unreview, auto-resolve,
rescan-restore) with sha256 at decision time so a downstream tool can detect
stale decisions before touching disk.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Defer Takeout sidecar enrichment until after indexing so its UPDATE
statements actually match rows. Previously it ran first and silently
no-op'd on the very first scan because no files existed in the DB yet.
- Preserve user review decisions across incremental and regroup rescans.
The grouping phase wipes duplicate_groups/duplicate_members, which
also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups
by (method, frozenset of member file_ids) before the wipe and re-applies
them to any post-regrouping group whose member set is unchanged.
- Replace 2-hex-char phash bucketing with multi-index pigeonhole
(16 nibble buckets per hash). At threshold=10, the previous bucketing
missed any near-duplicate pair that differed in the first byte, since
they landed in different buckets and were never compared. Caches
imagehash.hex_to_hash() per phash and dedups pair comparisons.
- Rewrite _suggested_keeper_by_resolution: previous implementation had
a dead inner score() function and the lambda was missing the date
tie-breaker (left as a TODO comment). Now picks largest pixels, ties
by file size, then by oldest exif_datetime.
- Filter phash candidates to length(phash)=16 to skip malformed hashes
rather than relying on the silent except in the comparison loop.
- Reject /api/scan/reset while a scan is running. Resetting mid-scan
wiped tables the running scan thread was still writing to.
- Also clears stale 'redundant' file status (not just 'keeper') when
a file no longer appears in any group after regrouping.
New GET /api/browse endpoint lists subdirectories at any path.
UI gets a folder icon button next to each path input that opens
a browsable directory tree modal. Escape or Cancel closes it,
clicking a folder navigates into it, Select confirms the choice.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>