paramiko's SSHClient.open_sftp() allocates an exec channel before the
SFTP subsystem request, which Synology DSM closes immediately with
'Channel closed'. Manual sftp(1) and WinSCP avoid this by going straight
to the SFTP subsystem on a fresh channel.
Replaced SSHClient with direct paramiko.Transport + SFTPClient.from_transport,
matching the OpenSSH/WinSCP flow. Larger flow-control windows (128 MB) too
since Synology has been observed to bail mid-handshake with the default 1 MB.
test_connection_verbose now reports per-step status (connect+auth,
open_sftp, listdir /, stat base_path, write probe). API returns the
steps array so the UI can show exactly which step failed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds 'Destinations' sidebar entry + view + add/edit/delete/test modal.
Generate-keypair button shows the public key for the user to paste into
the remote authorized_keys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation for the move/quarantine pipeline. Lets users register one or
more remote SFTP destinations through the API, store credentials at rest
under /data/sftp/{id}.{password|key} (mode 600), and verify connectivity
+ write access via a test endpoint.
Endpoints:
GET /api/sftp/destinations
POST /api/sftp/destinations — create
PUT /api/sftp/destinations/{id} — update
DELETE /api/sftp/destinations/{id}
POST /api/sftp/destinations/{id}/test — connect, stat base_path, mkdir probe
POST /api/sftp/keypair — generate ED25519 keypair
Host keys pinned per-destination on first connect (TOFU); subsequent
mismatches are rejected. paramiko added to requirements.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both _folder_priority and _path_penalty were scanning the entire path
string including the basename. A file named 'mytrashed_pic.jpg' in
/photos/MobileBackup/ would falsely match the 'trash' token.
Now only directory segments are checked; filename never influences keeper
selection beyond its actual path location.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The detail-panel insertion logic mixed parent contexts: it called
grid.parentNode.insertBefore() but used a child-of-grid as the reference
node. insertBefore requires the reference node to be a child of the
target parent — it threw 'node is not a child of this node' on every
click.
Replaced the inter-row positioning with simple insert-after-grid. Same
visual outcome since panel.scrollIntoView() handles user focus.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After every scan, automatically kick off a background thread that
generates a JPEG thumbnail for every file in a duplicate group and
caches it locally at /data/thumbs/. Idempotent — already-cached files
are skipped.
New endpoints:
POST /api/thumbs/generate — start pre-gen for all files
POST /api/thumbs/generate?only_in_groups=true — only dup-group files
GET /api/thumbs/status — progress (total/done/skipped/failed)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thumbnails (256px JPEG, q80) generated on first request and cached at
/data/thumbs/<shard>/<file_id>.jpg — i.e. on the workstation's local SSD,
not the NAS. Subsequent requests serve straight from cache, never
re-fetching from /photos.
HEIC/HEIF decoded via pillow-heif so iPhone photos finally render.
Videos cached as a single ffmpeg-extracted frame, not regenerated each
request. New DELETE /api/thumb/cache endpoint to wipe it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Use QUOTE_ALL + sanitise NUL/CR/LF in path/filename/exif fields. Default
csv dialect rejected fields containing line terminators with 'need to
escape, but no escapechar set'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a path-penalty score that downranks files in folders named Trashed,
Dups, Backup, Copy, Old, Archive, plus a penalty for repeated path segments
(e.g. Desktop/Desktop/Files) and very deep paths. Also captures and uses
file mtime as a tiebreaker — older files are usually the originals.
Applied to all four detection passes (sha256, phash, exif, filesize+dim)
and to auto-resolve-exact.
New file_mtime column with idempotent migration.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures every review action (keeper, redundant, skip, unreview, auto-resolve,
rescan-restore) with sha256 at decision time so a downstream tool can detect
stale decisions before touching disk.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
build-deb.sh used 'cp -r app/ source/' which renames app to source
when source doesn't yet exist, dropping the app/ wrapper that the
Dockerfile's COPY app/ /app/ depends on. The 2>/dev/null || true
on the cp lines hid the resulting failures, so the .deb shipped a
broken /opt/dupfinder/source/ that build-from-source could not use.
Pre-create the source dir and copy each item to its explicit
destination path. Bump package version to 1.0.1.
Also rework dupfinder-setup.sh's image-prep step: prefer a local
image, then a quiet registry pull, then build from the bundled
source. Removes the loud registry-not-found error that scared users
when the (unpublished) tocmo0nlord/dupfinder image wasn't on Docker
Hub.
The .deb install instructions in the README pointed at a URL that
doesn't exist — Gitea exposes the Debian registry as an apt repo, not
as plain file downloads. Switched the README to the apt-repo flow
(add a sources.list line, then apt install).
Also fixed build-deb.sh: Gitea's Debian package endpoint returns
HTTP 405 for token-bearer auth; it requires HTTP basic auth (user +
token-as-password) and the literal /upload suffix on the URL.
Package built and pushed to the registry — apt install works now.
Lay out the three install paths (Windows installer, .deb package, manual
docker compose) with concrete numbered steps and a 'pick your method'
table at the top so users don't have to read past their own platform.
Add a using-it walkthrough, a scan-mode explanation, and a short
troubleshooting section.
- Defer Takeout sidecar enrichment until after indexing so its UPDATE
statements actually match rows. Previously it ran first and silently
no-op'd on the very first scan because no files existed in the DB yet.
- Preserve user review decisions across incremental and regroup rescans.
The grouping phase wipes duplicate_groups/duplicate_members, which
also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups
by (method, frozenset of member file_ids) before the wipe and re-applies
them to any post-regrouping group whose member set is unchanged.
- Replace 2-hex-char phash bucketing with multi-index pigeonhole
(16 nibble buckets per hash). At threshold=10, the previous bucketing
missed any near-duplicate pair that differed in the first byte, since
they landed in different buckets and were never compared. Caches
imagehash.hex_to_hash() per phash and dedups pair comparisons.
- Rewrite _suggested_keeper_by_resolution: previous implementation had
a dead inner score() function and the lambda was missing the date
tie-breaker (left as a TODO comment). Now picks largest pixels, ties
by file size, then by oldest exif_datetime.
- Filter phash candidates to length(phash)=16 to skip malformed hashes
rather than relying on the silent except in the comparison loop.
- Reject /api/scan/reset while a scan is running. Resetting mid-scan
wiped tables the running scan thread was still writing to.
- Also clears stale 'redundant' file status (not just 'keeper') when
a file no longer appears in any group after regrouping.
dupfinder-setup.sh now verifies nvidia-container-toolkit is present
when a GPU is detected. If missing, prints install instructions and
offers to install it automatically (adds NVIDIA repo, installs toolkit,
configures Docker runtime, restarts Docker).
Without this toolkit Docker silently falls back to CPU even when a
GPU is present and the compose file has the device reservation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of walk-everything-first then index, workers now receive files
the instant os.walk yields them. The thread pool is open before the
walk starts; each discovered file is submitted immediately. Completed
futures are drained after each directory to keep memory flat.
Progress message shows:
"Discovering & indexing (8w): 1,234 — 5,678 found so far"
then once walk finishes:
"Indexing (8w): 8,000 / 9,100"
UI: merged Discovery + Indexing into a single "Discover + Index" phase pill.
Indeterminate progress bar stays on until total file count is known.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace single-threaded indexing loop with ThreadPoolExecutor.
Default workers = min(cpu_count*2, 16), tunable via DUPFINDER_WORKERS
env var. Pre-loads all existing DB records in one query instead of
N per-file queries. Progress message shows worker count and live
done/total count. Skipped files bulk-stamped in batches of 500.
On an 8-core machine over NAS: ~4-8x faster indexing phase.
On NVMe: up to 16x faster with 16 workers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
debian/control, postinst, prerm, postrm — standard dpkg package lifecycle
debian/files/opt/dupfinder/dupfinder-setup.sh — interactive setup:
checks Docker, detects NVIDIA GPU, prompts for photos/data paths,
writes docker-compose.override.yml with GPU reservation if present,
pulls image from registry (builds from source as fallback)
debian/files/usr/local/bin/dupfinder — CLI wrapper:
setup / start / stop / restart / status / logs / open / update
debian/files/etc/systemd/system/dupfinder.service — systemd unit,
guards against starting before setup has run
debian/build-deb.sh — builds .deb and uploads to Gitea package registry;
prints the exact apt sources.list line on success
Install on any Debian/Ubuntu machine:
echo "deb [trusted=yes] http://192.168.1.64:3000/api/packages/tocmo0nlord/debian bookworm main" \
| sudo tee /etc/apt/sources.list.d/dupfinder.list
sudo apt update && sudo apt install dupfinder
sudo dupfinder setup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GPU:
- Switch Dockerfile base to pytorch/pytorch:2.3.1-cuda12.1-cudnn8-runtime
- Add gpu_hasher.py: batched 2D DCT on GPU via PyTorch matrix multiply,
256 images/batch, produces imagehash-compatible 64-bit hex hashes,
auto-falls back to CPU when CUDA unavailable
- Replace per-image phash loop in scanner.py with phasher.hash_files()
- docker-compose.yml: add nvidia GPU device reservation
Hang fix:
- takeout.is_takeout_folder() now caps at 50 directories (was walking
entire tree — blocked for minutes on 65k+ file libraries)
- Add "Not a Takeout folder" status message so takeout phase is never silent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- build-release.ps1: builds Docker image, saves to tar, bundles
everything into dist\ ready to copy to a flash drive
- installer/install.ps1: checks WSL2, Docker Desktop, loads image
(or builds from source as fallback), prompts for photo/data paths,
writes docker-compose.override.yml, starts container, creates
desktop shortcut
- installer/uninstall.ps1: stops container, optionally removes image
and data, removes shortcut and app directory
- installer/dupfinder-start-stop.ps1: start/stop/restart/open helper
copied to target machine during install; desktop shortcut uses -Action open
which polls until the app is responsive before launching browser
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scanner now updates message every 250 files during os.walk so the UI
shows a live count. Progress bar switches to an indeterminate animated
pulse during discovery and takeout phases (no known total yet), then
reverts to a normal percentage bar once indexing begins.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New GET /api/browse endpoint lists subdirectories at any path.
UI gets a folder icon button next to each path input that opens
a browsable directory tree modal. Escape or Cancel closes it,
clicking a folder navigates into it, Select confirms the choice.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>