c110a8e4f9ab69c76ba9afc80eacb5febecd8f4b
GPU: - Switch Dockerfile base to pytorch/pytorch:2.3.1-cuda12.1-cudnn8-runtime - Add gpu_hasher.py: batched 2D DCT on GPU via PyTorch matrix multiply, 256 images/batch, produces imagehash-compatible 64-bit hex hashes, auto-falls back to CPU when CUDA unavailable - Replace per-image phash loop in scanner.py with phasher.hash_files() - docker-compose.yml: add nvidia GPU device reservation Hang fix: - takeout.is_takeout_folder() now caps at 50 directories (was walking entire tree — blocked for minutes on 65k+ file libraries) - Add "Not a Takeout folder" status message so takeout phase is never silent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Duplicate Finder
A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. No files are ever moved, renamed, or deleted — all decisions are recorded in SQLite only.
Quick start
# 1. Edit docker-compose.yml — set your photos volume path
# 2. Build and run
docker compose up -d --build
# 3. Open http://localhost:8765
# 4. Enter folder path in UI and click Scan
Volume mounts
| Container path | Purpose |
|---|---|
/photos |
Your photo library — mounted read-only |
/data |
SQLite database persistence |
Edit docker-compose.yml to point these at your NAS paths.
Detection methods
| Method | Color | Description |
|---|---|---|
| SHA-256 | Blue | Byte-identical files |
| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) |
| EXIF timestamp + device | Amber | Same camera, same moment |
| File size + dimensions | Gray | Same size and resolution (low confidence) |
Scan modes
| Mode | Description |
|---|---|
| Incremental | Only re-hashes changed/new files. Prior decisions preserved. |
| New files only | Indexes newly added files. Existing decisions untouched. |
| Rebuild groups | Re-runs detection on existing index. No re-hashing. |
| Full reset | Wipes everything and scans from scratch. |
Google Takeout
The scanner automatically detects Google Takeout folder structures and reads .json sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI.
What "redundant" means
Marking a file redundant only writes to the database. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions.
Tech stack
- Python 3.12, FastAPI, Uvicorn
- SQLite (stdlib
sqlite3) - Pillow, imagehash, pillow-heif
- Vanilla JS single-page frontend
- Docker / docker-compose
Description
Languages
Python
53.5%
HTML
34%
PowerShell
12.2%
Dockerfile
0.3%