Commit Graph

12 Commits

Author SHA1 Message Date
Carlos
293355b724 SFTP: switch to Transport-based connection (fixes Synology 'Channel closed')
paramiko's SSHClient.open_sftp() allocates an exec channel before the
SFTP subsystem request, which Synology DSM closes immediately with
'Channel closed'. Manual sftp(1) and WinSCP avoid this by going straight
to the SFTP subsystem on a fresh channel.

Replaced SSHClient with direct paramiko.Transport + SFTPClient.from_transport,
matching the OpenSSH/WinSCP flow. Larger flow-control windows (128 MB) too
since Synology has been observed to bail mid-handshake with the default 1 MB.

test_connection_verbose now reports per-step status (connect+auth,
open_sftp, listdir /, stat base_path, write probe). API returns the
steps array so the UI can show exactly which step failed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 21:43:56 -04:00
Carlos
7436b23db3 Stage 2 #1: SFTP destinations CRUD + connection test
Foundation for the move/quarantine pipeline. Lets users register one or
more remote SFTP destinations through the API, store credentials at rest
under /data/sftp/{id}.{password|key} (mode 600), and verify connectivity
+ write access via a test endpoint.

Endpoints:
  GET    /api/sftp/destinations
  POST   /api/sftp/destinations             — create
  PUT    /api/sftp/destinations/{id}        — update
  DELETE /api/sftp/destinations/{id}
  POST   /api/sftp/destinations/{id}/test   — connect, stat base_path, mkdir probe
  POST   /api/sftp/keypair                  — generate ED25519 keypair

Host keys pinned per-destination on first connect (TOFU); subsequent
mismatches are rejected. paramiko added to requirements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 20:04:42 -04:00
Carlos
759288b37e Pre-generate all thumbnails up-front, not on scroll
After every scan, automatically kick off a background thread that
generates a JPEG thumbnail for every file in a duplicate group and
caches it locally at /data/thumbs/. Idempotent — already-cached files
are skipped.

New endpoints:
  POST /api/thumbs/generate            — start pre-gen for all files
  POST /api/thumbs/generate?only_in_groups=true  — only dup-group files
  GET  /api/thumbs/status              — progress (total/done/skipped/failed)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 16:33:19 -04:00
Carlos
4c21e9fa1c Add workstation-local thumbnail cache + HEIC support
Thumbnails (256px JPEG, q80) generated on first request and cached at
/data/thumbs/<shard>/<file_id>.jpg — i.e. on the workstation's local SSD,
not the NAS. Subsequent requests serve straight from cache, never
re-fetching from /photos.

HEIC/HEIF decoded via pillow-heif so iPhone photos finally render.
Videos cached as a single ffmpeg-extracted frame, not regenerated each
request. New DELETE /api/thumb/cache endpoint to wipe it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 16:29:29 -04:00
Carlos
81b38cb5bb CSV export: path column now contains directory only
Filename was duplicated in both columns; trimmed the basename off path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 16:03:34 -04:00
Carlos
d95bf69be0 Fix CSV export crash on filenames with embedded newlines
Use QUOTE_ALL + sanitise NUL/CR/LF in path/filename/exif fields. Default
csv dialect rejected fields containing line terminators with 'need to
escape, but no escapechar set'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 13:17:11 -04:00
Carlos
14c6012808 Smarter keeper selection: folder-name + mtime signals
Adds a path-penalty score that downranks files in folders named Trashed,
Dups, Backup, Copy, Old, Archive, plus a penalty for repeated path segments
(e.g. Desktop/Desktop/Files) and very deep paths. Also captures and uses
file mtime as a tiebreaker — older files are usually the originals.

Applied to all four detection passes (sha256, phash, exif, filesize+dim)
and to auto-resolve-exact.

New file_mtime column with idempotent migration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:56:52 -04:00
Carlos
6a4134762c Add decisions audit log for future move/delete tool
Captures every review action (keeper, redundant, skip, unreview, auto-resolve,
rescan-restore) with sha256 at decision time so a downstream tool can detect
stale decisions before touching disk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 01:40:54 -04:00
Carlos
3001be3a92 Fix correctness bugs in scanner and reset endpoint
- Defer Takeout sidecar enrichment until after indexing so its UPDATE
  statements actually match rows. Previously it ran first and silently
  no-op'd on the very first scan because no files existed in the DB yet.

- Preserve user review decisions across incremental and regroup rescans.
  The grouping phase wipes duplicate_groups/duplicate_members, which
  also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups
  by (method, frozenset of member file_ids) before the wipe and re-applies
  them to any post-regrouping group whose member set is unchanged.

- Replace 2-hex-char phash bucketing with multi-index pigeonhole
  (16 nibble buckets per hash). At threshold=10, the previous bucketing
  missed any near-duplicate pair that differed in the first byte, since
  they landed in different buckets and were never compared. Caches
  imagehash.hex_to_hash() per phash and dedups pair comparisons.

- Rewrite _suggested_keeper_by_resolution: previous implementation had
  a dead inner score() function and the lambda was missing the date
  tie-breaker (left as a TODO comment). Now picks largest pixels, ties
  by file size, then by oldest exif_datetime.

- Filter phash candidates to length(phash)=16 to skip malformed hashes
  rather than relying on the silent except in the comparison loop.

- Reject /api/scan/reset while a scan is running. Resetting mid-scan
  wiped tables the running scan thread was still writing to.

- Also clears stale 'redundant' file status (not just 'keeper') when
  a file no longer appears in any group after regrouping.
2026-04-24 00:42:13 -04:00
tocmo
356f922940 feat: replace Cancel with Pause/Resume — survives server restarts
- scanner.py: replace cancel_requested with pause_requested throughout;
  pause during walk drains in-flight futures gracefully then saves state;
  phash phase processes in 500-image chunks with pause check between each;
  _save_pause_state() persists files_indexed/phashes_done/last_phase to DB;
  init_db() already detects killed-mid-scan (running→paused) on startup

- main.py: add POST /api/scan/pause and POST /api/scan/resume endpoints;
  /api/scan/cancel kept as alias; scan_status now returns folder_path,
  files_indexed, phashes_done; scan_reset clears all new fields

- index.html: "Cancel" → "⏸ Pause" button; new #paused-area banner shows
  folder, files indexed, phashes done with "▶ Resume" and "Full reset"
  buttons; updateScanUI handles paused status; pauseScan()/resumeScan()
  JS functions added; chip gains .paused amber style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 02:11:00 -04:00
tocmo
c19825c523 Add server-side folder picker
New GET /api/browse endpoint lists subdirectories at any path.
UI gets a folder icon button next to each path input that opens
a browsable directory tree modal. Escape or Cancel closes it,
clicking a folder navigates into it, Select confirms the choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 23:55:42 -04:00
tocmo
868da9016d Initial implementation of duplicate finder
Full project per spec: FastAPI backend, 4-method duplicate detection
(SHA-256, phash, EXIF, filesize), Google Takeout pre-processor,
4 scan modes, and dark-theme vanilla JS gallery frontend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 23:42:58 -04:00