Commit Graph

14 Commits

Author SHA1 Message Date
Carlos
76e89a7313 Fix .deb install path and Gitea upload auth
The .deb install instructions in the README pointed at a URL that
doesn't exist — Gitea exposes the Debian registry as an apt repo, not
as plain file downloads. Switched the README to the apt-repo flow
(add a sources.list line, then apt install).

Also fixed build-deb.sh: Gitea's Debian package endpoint returns
HTTP 405 for token-bearer auth; it requires HTTP basic auth (user +
token-as-password) and the literal /upload suffix on the URL.

Package built and pushed to the registry — apt install works now.
2026-04-24 00:55:11 -04:00
Carlos
90790b648d Rewrite README install instructions for end users
Lay out the three install paths (Windows installer, .deb package, manual
docker compose) with concrete numbered steps and a 'pick your method'
table at the top so users don't have to read past their own platform.
Add a using-it walkthrough, a scan-mode explanation, and a short
troubleshooting section.
2026-04-24 00:48:20 -04:00
Carlos
3001be3a92 Fix correctness bugs in scanner and reset endpoint
- Defer Takeout sidecar enrichment until after indexing so its UPDATE
  statements actually match rows. Previously it ran first and silently
  no-op'd on the very first scan because no files existed in the DB yet.

- Preserve user review decisions across incremental and regroup rescans.
  The grouping phase wipes duplicate_groups/duplicate_members, which
  also wiped reviewed=1 / is_keeper flags. Now snapshots reviewed groups
  by (method, frozenset of member file_ids) before the wipe and re-applies
  them to any post-regrouping group whose member set is unchanged.

- Replace 2-hex-char phash bucketing with multi-index pigeonhole
  (16 nibble buckets per hash). At threshold=10, the previous bucketing
  missed any near-duplicate pair that differed in the first byte, since
  they landed in different buckets and were never compared. Caches
  imagehash.hex_to_hash() per phash and dedups pair comparisons.

- Rewrite _suggested_keeper_by_resolution: previous implementation had
  a dead inner score() function and the lambda was missing the date
  tie-breaker (left as a TODO comment). Now picks largest pixels, ties
  by file size, then by oldest exif_datetime.

- Filter phash candidates to length(phash)=16 to skip malformed hashes
  rather than relying on the silent except in the comparison loop.

- Reject /api/scan/reset while a scan is running. Resetting mid-scan
  wiped tables the running scan thread was still writing to.

- Also clears stale 'redundant' file status (not just 'keeper') when
  a file no longer appears in any group after regrouping.
2026-04-24 00:42:13 -04:00
tocmo
356f922940 feat: replace Cancel with Pause/Resume — survives server restarts
- scanner.py: replace cancel_requested with pause_requested throughout;
  pause during walk drains in-flight futures gracefully then saves state;
  phash phase processes in 500-image chunks with pause check between each;
  _save_pause_state() persists files_indexed/phashes_done/last_phase to DB;
  init_db() already detects killed-mid-scan (running→paused) on startup

- main.py: add POST /api/scan/pause and POST /api/scan/resume endpoints;
  /api/scan/cancel kept as alias; scan_status now returns folder_path,
  files_indexed, phashes_done; scan_reset clears all new fields

- index.html: "Cancel" → "⏸ Pause" button; new #paused-area banner shows
  folder, files indexed, phashes done with "▶ Resume" and "Full reset"
  buttons; updateScanUI handles paused status; pauseScan()/resumeScan()
  JS functions added; chip gains .paused amber style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 02:11:00 -04:00
tocmo
f37bd76fed Fix GPU setup: check and install nvidia-container-toolkit
dupfinder-setup.sh now verifies nvidia-container-toolkit is present
when a GPU is detected. If missing, prints install instructions and
offers to install it automatically (adds NVIDIA repo, installs toolkit,
configures Docker runtime, restarts Docker).

Without this toolkit Docker silently falls back to CPU even when a
GPU is present and the compose file has the device reservation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:57:51 -04:00
tocmo
a6748de6e0 Pipeline discovery and indexing — workers start immediately
Instead of walk-everything-first then index, workers now receive files
the instant os.walk yields them. The thread pool is open before the
walk starts; each discovered file is submitted immediately. Completed
futures are drained after each directory to keep memory flat.

Progress message shows:
  "Discovering & indexing (8w): 1,234 — 5,678 found so far"
  then once walk finishes:
  "Indexing (8w): 8,000 / 9,100"

UI: merged Discovery + Indexing into a single "Discover + Index" phase pill.
Indeterminate progress bar stays on until total file count is known.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:54:43 -04:00
tocmo
fef364162c Parallel SHA-256 indexing with thread pool
Replace single-threaded indexing loop with ThreadPoolExecutor.
Default workers = min(cpu_count*2, 16), tunable via DUPFINDER_WORKERS
env var. Pre-loads all existing DB records in one query instead of
N per-file queries. Progress message shows worker count and live
done/total count. Skipped files bulk-stamped in batches of 500.

On an 8-core machine over NAS: ~4-8x faster indexing phase.
On NVMe: up to 16x faster with 16 workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:48:30 -04:00
tocmo
f9164b4fa0 Add Debian package and Gitea APT repository support
debian/control, postinst, prerm, postrm — standard dpkg package lifecycle
debian/files/opt/dupfinder/dupfinder-setup.sh — interactive setup:
  checks Docker, detects NVIDIA GPU, prompts for photos/data paths,
  writes docker-compose.override.yml with GPU reservation if present,
  pulls image from registry (builds from source as fallback)
debian/files/usr/local/bin/dupfinder — CLI wrapper:
  setup / start / stop / restart / status / logs / open / update
debian/files/etc/systemd/system/dupfinder.service — systemd unit,
  guards against starting before setup has run
debian/build-deb.sh — builds .deb and uploads to Gitea package registry;
  prints the exact apt sources.list line on success

Install on any Debian/Ubuntu machine:
  echo "deb [trusted=yes] http://192.168.1.64:3000/api/packages/tocmo0nlord/debian bookworm main" \
    | sudo tee /etc/apt/sources.list.d/dupfinder.list
  sudo apt update && sudo apt install dupfinder
  sudo dupfinder setup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:42:45 -04:00
tocmo
c110a8e4f9 GPU-accelerated phash + fix discovery/takeout hang
GPU:
- Switch Dockerfile base to pytorch/pytorch:2.3.1-cuda12.1-cudnn8-runtime
- Add gpu_hasher.py: batched 2D DCT on GPU via PyTorch matrix multiply,
  256 images/batch, produces imagehash-compatible 64-bit hex hashes,
  auto-falls back to CPU when CUDA unavailable
- Replace per-image phash loop in scanner.py with phasher.hash_files()
- docker-compose.yml: add nvidia GPU device reservation

Hang fix:
- takeout.is_takeout_folder() now caps at 50 directories (was walking
  entire tree — blocked for minutes on 65k+ file libraries)
- Add "Not a Takeout folder" status message so takeout phase is never silent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:37:28 -04:00
tocmo
1d46b9945d Add portable flash-drive installer
- build-release.ps1: builds Docker image, saves to tar, bundles
  everything into dist\ ready to copy to a flash drive
- installer/install.ps1: checks WSL2, Docker Desktop, loads image
  (or builds from source as fallback), prompts for photo/data paths,
  writes docker-compose.override.yml, starts container, creates
  desktop shortcut
- installer/uninstall.ps1: stops container, optionally removes image
  and data, removes shortcut and app directory
- installer/dupfinder-start-stop.ps1: start/stop/restart/open helper
  copied to target machine during install; desktop shortcut uses -Action open
  which polls until the app is responsive before launching browser

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:32:32 -04:00
tocmo
b519e065cb Fix discovery phase appearing frozen
Scanner now updates message every 250 files during os.walk so the UI
shows a live count. Progress bar switches to an indeterminate animated
pulse during discovery and takeout phases (no known total yet), then
reverts to a normal percentage bar once indexing begins.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:25:41 -04:00
tocmo
6e7bb241ad Add .gitignore, remove pycache and db from tracking 2026-04-04 23:55:53 -04:00
tocmo
c19825c523 Add server-side folder picker
New GET /api/browse endpoint lists subdirectories at any path.
UI gets a folder icon button next to each path input that opens
a browsable directory tree modal. Escape or Cancel closes it,
clicking a folder navigates into it, Select confirms the choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 23:55:42 -04:00
tocmo
868da9016d Initial implementation of duplicate finder
Full project per spec: FastAPI backend, 4-method duplicate detection
(SHA-256, phash, EXIF, filesize), Google Takeout pre-processor,
4 scan modes, and dark-theme vanilla JS gallery frontend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 23:42:58 -04:00