Duplicate Finder

Self-hosted web app that scans your photo and video library, finds duplicates four different ways, and lets you review them in a browser. It never moves, renames, or deletes anything — every decision is recorded in a SQLite database. A separate tool (coming later) will act on those decisions.

Once installed, open http://localhost:8765 in any browser to use it.


Pick your install method

You have… Use this
Windows 10/11 Windows installer (one PowerShell command)
Debian / Ubuntu / Proxmox LXC .deb package (apt install)
Anything else with Docker Docker Compose (manual)

All three installs end up running the same Docker container.


Windows 10/11

What you need: Docker Desktop (the installer will check for it and offer to download).

  1. Download the latest release zip from the Gitea Releases page and extract it anywhere.
  2. Right-click installer\install.ps1Run with PowerShell (or open an elevated PowerShell and run it).
  3. When prompted, type the path to your photos folder (e.g. D:\Photos) and a folder for the database (default is fine).
  4. The installer starts the container and puts a DupFinder shortcut on your desktop.

Day-to-day use: double-click the desktop shortcut, or browse to http://localhost:8765.

Uninstall: run installer\uninstall.ps1 as administrator.


Debian / Ubuntu / Proxmox

What you need: Docker Engine. If you don't have it: curl -fsSL https://get.docker.com | sh.

# 1. Add the Gitea apt repo
echo "deb [trusted=yes] http://192.168.1.64:3000/api/packages/tocmo0nlord/debian bookworm main" \
  | sudo tee /etc/apt/sources.list.d/dupfinder.list

# 2. Install
sudo apt update
sudo apt install dupfinder

# 3. Run first-time setup (asks for photos path + data path)
sudo dupfinder setup

# 4. Start it
sudo dupfinder start

The repo says bookworm (Debian 12). For Ubuntu/other distros the package still works — the codename in the URL is just how Gitea organizes the registry.

One-shot install without the apt repo:

curl -u tocmo0nlord:<your-token> -O \
  http://192.168.1.64:3000/api/packages/tocmo0nlord/debian/pool/bookworm/main/dupfinder_1.0.0_amd64.deb
sudo apt install ./dupfinder_1.0.0_amd64.deb

Manage the service:

Command What it does
sudo dupfinder start Start the container
sudo dupfinder stop Stop the container
sudo dupfinder restart Restart
sudo dupfinder status Show systemd status
sudo dupfinder logs Tail the logs
dupfinder open Open in your default browser

The service auto-starts on boot via systemd (dupfinder.service).

Uninstall: sudo apt remove dupfinder (your photos and database are left untouched).


Manual Docker Compose

For NAS appliances (Synology, Unraid, TrueNAS), Mac, or any host where you'd rather wire it up yourself.

  1. Clone the repo:
    git clone http://192.168.1.64:3000/tocmo0nlord/duplicate-finder.git
    cd duplicate-finder
    
  2. Open docker-compose.yml and change the two volume paths under dup-finder::
    volumes:
      - /your/photos/path:/photos:ro       # ← your photo library (read-only)
      - /your/data/path:/data              # ← where the SQLite DB lives
    
  3. Build and start:
    docker compose up -d --build
    
  4. Open http://localhost:8765.

To stop: docker compose down. To update later: git pull && docker compose up -d --build.

GPU acceleration (optional): the compose file requests an NVIDIA GPU for faster perceptual hashing. If you don't have one, delete the deploy.resources.reservations.devices block — the app falls back to CPU automatically.


Using it

  1. Open http://localhost:8765.
  2. Click Browse and pick the folder you want to scan (it's relative to the container — usually just /photos).
  3. Pick a scan mode (see below) and click Scan.
  4. When it finishes, review the duplicate groups. Each group shows the suggested keeper highlighted; click any other photo to pick it instead, or Keep all to skip the group.
  5. When you're done, click Download CSV to export all decisions.

Scan modes

Mode When to use
Incremental (default) Day-to-day rescans. Re-hashes only changed/new files. Past review decisions are preserved.
New files only Fastest option. Indexes only files added since the last scan.
Rebuild groups Re-runs duplicate detection on the existing index without re-hashing.
Full reset Wipes the entire index and starts from scratch.

Detection methods

Method UI color What it catches
SHA-256 Blue Byte-identical files
Perceptual hash Purple Visually similar photos (hamming ≤ 10)
EXIF timestamp + device Amber Same camera, same moment
File size + dimensions Gray Same size and resolution (low confidence)

Google Takeout

Point it at a Google Photos Takeout export and it auto-detects the structure, reads the .json sidecars, and restores the correct capture timestamps and original filenames. Takeout files get a flag in the UI.


Troubleshooting

The page won't load at http://localhost:8765 Check the container is up: docker ps | grep dup-finder. If not, see the logs: docker compose logs dup-finder (or sudo dupfinder logs on Debian).

"Permission denied" reading photos The /photos mount is read-only by design, but the container still needs read access. Make sure your user (or the docker daemon) can read the folder you mounted.

Scan is stuck on "phash" Perceptual hashing is the slowest phase — large libraries (>50k photos) on CPU can take hours. Add an NVIDIA GPU and the deploy.resources block in compose to get a 10-50× speedup.

I marked the wrong file as keeper Open the group again and click Unreview, then re-decide.


What "redundant" means

When you mark a file redundant, only the database is updated. Nothing on disk changes. This tool produces a decision record. A future companion tool will use that record to actually move or delete files.


Tech stack

  • Python 3.12, FastAPI, Uvicorn
  • SQLite (stdlib sqlite3)
  • Pillow, imagehash, pillow-heif
  • PyTorch + CUDA for batched perceptual hashing
  • Vanilla JS single-page frontend
  • Docker / docker-compose
Description
No description provided
Readme 507 KiB
Languages
Python 53.5%
HTML 34%
PowerShell 12.2%
Dockerfile 0.3%