diff --git a/README.md b/README.md index deb128d..229822e 100644 --- a/README.md +++ b/README.md @@ -1,56 +1,157 @@ # Duplicate Finder -A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. **No files are ever moved, renamed, or deleted** — all decisions are recorded in SQLite only. +Self-hosted web app that scans your photo and video library, finds duplicates four different ways, and lets you review them in a browser. **It never moves, renames, or deletes anything** — every decision is recorded in a SQLite database. A separate tool (coming later) will act on those decisions. -## Quick start +> Once installed, open **http://localhost:8765** in any browser to use it. + +--- + +## Pick your install method + +| You have… | Use this | +|---|---| +| **Windows 10/11** | [Windows installer](#windows-1011) (one PowerShell command) | +| **Debian / Ubuntu / Proxmox LXC** | [.deb package](#debian--ubuntu--proxmox) (`apt install`) | +| **Anything else with Docker** | [Docker Compose](#manual-docker-compose) (manual) | + +All three installs end up running the same Docker container. + +--- + +### Windows 10/11 + +**What you need:** Docker Desktop (the installer will check for it and offer to download). + +1. Download the latest release zip from the Gitea **Releases** page and extract it anywhere. +2. Right-click `installer\install.ps1` → **Run with PowerShell** (or open an elevated PowerShell and run it). +3. When prompted, type the path to your photos folder (e.g. `D:\Photos`) and a folder for the database (default is fine). +4. The installer starts the container and puts a **DupFinder** shortcut on your desktop. + +**Day-to-day use:** double-click the desktop shortcut, or browse to http://localhost:8765. + +**Uninstall:** run `installer\uninstall.ps1` as administrator. + +--- + +### Debian / Ubuntu / Proxmox + +**What you need:** Docker Engine. If you don't have it: `curl -fsSL https://get.docker.com | sh`. ```bash -# 1. Edit docker-compose.yml — set your photos volume path -# 2. Build and run -docker compose up -d --build -# 3. Open http://localhost:8765 -# 4. Enter folder path in UI and click Scan +# 1. Install the package +wget http://192.168.1.64:3000/tocmo0nlord/-/packages/debian/dupfinder/1.0.0/files/amd64/dupfinder_1.0.0_amd64.deb +sudo apt install ./dupfinder_1.0.0_amd64.deb + +# 2. Run first-time setup (asks for photos path + data path) +sudo dupfinder setup + +# 3. Start it +sudo dupfinder start ``` -## Volume mounts +**Manage the service:** -| Container path | Purpose | +| Command | What it does | |---|---| -| `/photos` | Your photo library — mounted **read-only** | -| `/data` | SQLite database persistence | +| `sudo dupfinder start` | Start the container | +| `sudo dupfinder stop` | Stop the container | +| `sudo dupfinder restart` | Restart | +| `sudo dupfinder status` | Show systemd status | +| `sudo dupfinder logs` | Tail the logs | +| `dupfinder open` | Open in your default browser | -Edit `docker-compose.yml` to point these at your NAS paths. +The service auto-starts on boot via systemd (`dupfinder.service`). -## Detection methods +**Uninstall:** `sudo apt remove dupfinder` (your photos and database are left untouched). -| Method | Color | Description | +--- + +### Manual Docker Compose + +For NAS appliances (Synology, Unraid, TrueNAS), Mac, or any host where you'd rather wire it up yourself. + +1. Clone the repo: + ```bash + git clone http://192.168.1.64:3000/tocmo0nlord/duplicate-finder.git + cd duplicate-finder + ``` +2. Open `docker-compose.yml` and change the two volume paths under `dup-finder:`: + ```yaml + volumes: + - /your/photos/path:/photos:ro # ← your photo library (read-only) + - /your/data/path:/data # ← where the SQLite DB lives + ``` +3. Build and start: + ```bash + docker compose up -d --build + ``` +4. Open http://localhost:8765. + +To stop: `docker compose down`. To update later: `git pull && docker compose up -d --build`. + +> **GPU acceleration (optional):** the compose file requests an NVIDIA GPU for faster perceptual hashing. If you don't have one, delete the `deploy.resources.reservations.devices` block — the app falls back to CPU automatically. + +--- + +## Using it + +1. Open http://localhost:8765. +2. Click **Browse** and pick the folder you want to scan (it's relative to the container — usually just `/photos`). +3. Pick a scan mode (see below) and click **Scan**. +4. When it finishes, review the duplicate groups. Each group shows the suggested keeper highlighted; click any other photo to pick it instead, or **Keep all** to skip the group. +5. When you're done, click **Download CSV** to export all decisions. + +### Scan modes + +| Mode | When to use | +|---|---| +| **Incremental** *(default)* | Day-to-day rescans. Re-hashes only changed/new files. Past review decisions are preserved. | +| **New files only** | Fastest option. Indexes only files added since the last scan. | +| **Rebuild groups** | Re-runs duplicate detection on the existing index without re-hashing. | +| **Full reset** | Wipes the entire index and starts from scratch. | + +### Detection methods + +| Method | UI color | What it catches | |---|---|---| -| SHA-256 | Blue | Byte-identical files | -| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) | -| EXIF timestamp + device | Amber | Same camera, same moment | -| File size + dimensions | Gray | Same size and resolution (low confidence) | +| **SHA-256** | Blue | Byte-identical files | +| **Perceptual hash** | Purple | Visually similar photos (hamming ≤ 10) | +| **EXIF timestamp + device** | Amber | Same camera, same moment | +| **File size + dimensions** | Gray | Same size and resolution (low confidence) | -## Scan modes +### Google Takeout -| Mode | Description | -|---|---| -| Incremental | Only re-hashes changed/new files. Prior decisions preserved. | -| New files only | Indexes newly added files. Existing decisions untouched. | -| Rebuild groups | Re-runs detection on existing index. No re-hashing. | -| Full reset | Wipes everything and scans from scratch. | +Point it at a Google Photos Takeout export and it auto-detects the structure, reads the `.json` sidecars, and restores the correct capture timestamps and original filenames. Takeout files get a flag in the UI. -## Google Takeout +--- -The scanner automatically detects Google Takeout folder structures and reads `.json` sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI. +## Troubleshooting + +**The page won't load at http://localhost:8765** +Check the container is up: `docker ps | grep dup-finder`. If not, see the logs: `docker compose logs dup-finder` (or `sudo dupfinder logs` on Debian). + +**"Permission denied" reading photos** +The `/photos` mount is read-only by design, but the container still needs read access. Make sure your user (or the docker daemon) can read the folder you mounted. + +**Scan is stuck on "phash"** +Perceptual hashing is the slowest phase — large libraries (>50k photos) on CPU can take hours. Add an NVIDIA GPU and the `deploy.resources` block in compose to get a 10-50× speedup. + +**I marked the wrong file as keeper** +Open the group again and click **Unreview**, then re-decide. + +--- ## What "redundant" means -Marking a file redundant **only writes to the database**. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions. +When you mark a file redundant, **only the database is updated**. Nothing on disk changes. This tool produces a decision record. A future companion tool will use that record to actually move or delete files. + +--- ## Tech stack - Python 3.12, FastAPI, Uvicorn - SQLite (stdlib `sqlite3`) - Pillow, imagehash, pillow-heif +- PyTorch + CUDA for batched perceptual hashing - Vanilla JS single-page frontend - Docker / docker-compose