# Duplicate Finder Self-hosted web app that scans your photo and video library, finds duplicates four different ways, and lets you review them in a browser. **It never moves, renames, or deletes anything** — every decision is recorded in a SQLite database. A separate tool (coming later) will act on those decisions. > Once installed, open **http://localhost:8765** in any browser to use it. --- ## Pick your install method | You have… | Use this | |---|---| | **Windows 10/11** | [Windows installer](#windows-1011) (one PowerShell command) | | **Debian / Ubuntu / Proxmox LXC** | [.deb package](#debian--ubuntu--proxmox) (`apt install`) | | **Anything else with Docker** | [Docker Compose](#manual-docker-compose) (manual) | All three installs end up running the same Docker container. --- ### Windows 10/11 **What you need:** Docker Desktop (the installer will check for it and offer to download). 1. Download the latest release zip from the Gitea **Releases** page and extract it anywhere. 2. Right-click `installer\install.ps1` → **Run with PowerShell** (or open an elevated PowerShell and run it). 3. When prompted, type the path to your photos folder (e.g. `D:\Photos`) and a folder for the database (default is fine). 4. The installer starts the container and puts a **DupFinder** shortcut on your desktop. **Day-to-day use:** double-click the desktop shortcut, or browse to http://localhost:8765. **Uninstall:** run `installer\uninstall.ps1` as administrator. --- ### Debian / Ubuntu / Proxmox **What you need:** Docker Engine. If you don't have it: `curl -fsSL https://get.docker.com | sh`. ```bash # 1. Add the Gitea apt repo echo "deb [trusted=yes] http://192.168.1.64:3000/api/packages/tocmo0nlord/debian bookworm main" \ | sudo tee /etc/apt/sources.list.d/dupfinder.list # 2. Install sudo apt update sudo apt install dupfinder # 3. Run first-time setup (asks for photos path + data path) sudo dupfinder setup # 4. Start it sudo dupfinder start ``` > The repo says `bookworm` (Debian 12). For Ubuntu/other distros the package still works — the codename in the URL is just how Gitea organizes the registry. > **One-shot install without the apt repo:** > ```bash > curl -u tocmo0nlord: -O \ > http://192.168.1.64:3000/api/packages/tocmo0nlord/debian/pool/bookworm/main/dupfinder_1.0.0_amd64.deb > sudo apt install ./dupfinder_1.0.0_amd64.deb > ``` **Manage the service:** | Command | What it does | |---|---| | `sudo dupfinder start` | Start the container | | `sudo dupfinder stop` | Stop the container | | `sudo dupfinder restart` | Restart | | `sudo dupfinder status` | Show systemd status | | `sudo dupfinder logs` | Tail the logs | | `dupfinder open` | Open in your default browser | The service auto-starts on boot via systemd (`dupfinder.service`). **Uninstall:** `sudo apt remove dupfinder` (your photos and database are left untouched). --- ### Manual Docker Compose For NAS appliances (Synology, Unraid, TrueNAS), Mac, or any host where you'd rather wire it up yourself. 1. Clone the repo: ```bash git clone http://192.168.1.64:3000/tocmo0nlord/duplicate-finder.git cd duplicate-finder ``` 2. Open `docker-compose.yml` and change the two volume paths under `dup-finder:`: ```yaml volumes: - /your/photos/path:/photos:ro # ← your photo library (read-only) - /your/data/path:/data # ← where the SQLite DB lives ``` 3. Build and start: ```bash docker compose up -d --build ``` 4. Open http://localhost:8765. To stop: `docker compose down`. To update later: `git pull && docker compose up -d --build`. > **GPU acceleration (optional):** the compose file requests an NVIDIA GPU for faster perceptual hashing. If you don't have one, delete the `deploy.resources.reservations.devices` block — the app falls back to CPU automatically. --- ## Using it 1. Open http://localhost:8765. 2. Click **Browse** and pick the folder you want to scan (it's relative to the container — usually just `/photos`). 3. Pick a scan mode (see below) and click **Scan**. 4. When it finishes, review the duplicate groups. Each group shows the suggested keeper highlighted; click any other photo to pick it instead, or **Keep all** to skip the group. 5. When you're done, click **Download CSV** to export all decisions. ### Scan modes | Mode | When to use | |---|---| | **Incremental** *(default)* | Day-to-day rescans. Re-hashes only changed/new files. Past review decisions are preserved. | | **New files only** | Fastest option. Indexes only files added since the last scan. | | **Rebuild groups** | Re-runs duplicate detection on the existing index without re-hashing. | | **Full reset** | Wipes the entire index and starts from scratch. | ### Detection methods | Method | UI color | What it catches | |---|---|---| | **SHA-256** | Blue | Byte-identical files | | **Perceptual hash** | Purple | Visually similar photos (hamming ≤ 10) | | **EXIF timestamp + device** | Amber | Same camera, same moment | | **File size + dimensions** | Gray | Same size and resolution (low confidence) | ### Google Takeout Point it at a Google Photos Takeout export and it auto-detects the structure, reads the `.json` sidecars, and restores the correct capture timestamps and original filenames. Takeout files get a flag in the UI. --- ## Troubleshooting **The page won't load at http://localhost:8765** Check the container is up: `docker ps | grep dup-finder`. If not, see the logs: `docker compose logs dup-finder` (or `sudo dupfinder logs` on Debian). **"Permission denied" reading photos** The `/photos` mount is read-only by design, but the container still needs read access. Make sure your user (or the docker daemon) can read the folder you mounted. **Scan is stuck on "phash"** Perceptual hashing is the slowest phase — large libraries (>50k photos) on CPU can take hours. Add an NVIDIA GPU and the `deploy.resources` block in compose to get a 10-50× speedup. **I marked the wrong file as keeper** Open the group again and click **Unreview**, then re-decide. --- ## What "redundant" means When you mark a file redundant, **only the database is updated**. Nothing on disk changes. This tool produces a decision record. A future companion tool will use that record to actually move or delete files. --- ## Tech stack - Python 3.12, FastAPI, Uvicorn - SQLite (stdlib `sqlite3`) - Pillow, imagehash, pillow-heif - PyTorch + CUDA for batched perceptual hashing - Vanilla JS single-page frontend - Docker / docker-compose