Rewrite README install instructions for end users

Lay out the three install paths (Windows installer, .deb package, manual
docker compose) with concrete numbered steps and a 'pick your method'
table at the top so users don't have to read past their own platform.
Add a using-it walkthrough, a scan-mode explanation, and a short
troubleshooting section.
This commit is contained in:
Carlos
2026-04-24 00:48:20 -04:00
parent 3001be3a92
commit 90790b648d

157
README.md
View File

@@ -1,56 +1,157 @@
# Duplicate Finder
A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. **No files are ever moved, renamed, or deleted**all decisions are recorded in SQLite only.
Self-hosted web app that scans your photo and video library, finds duplicates four different ways, and lets you review them in a browser. **It never moves, renames, or deletes anything**every decision is recorded in a SQLite database. A separate tool (coming later) will act on those decisions.
## Quick start
> Once installed, open **http://localhost:8765** in any browser to use it.
---
## Pick your install method
| You have… | Use this |
|---|---|
| **Windows 10/11** | [Windows installer](#windows-1011) (one PowerShell command) |
| **Debian / Ubuntu / Proxmox LXC** | [.deb package](#debian--ubuntu--proxmox) (`apt install`) |
| **Anything else with Docker** | [Docker Compose](#manual-docker-compose) (manual) |
All three installs end up running the same Docker container.
---
### Windows 10/11
**What you need:** Docker Desktop (the installer will check for it and offer to download).
1. Download the latest release zip from the Gitea **Releases** page and extract it anywhere.
2. Right-click `installer\install.ps1`**Run with PowerShell** (or open an elevated PowerShell and run it).
3. When prompted, type the path to your photos folder (e.g. `D:\Photos`) and a folder for the database (default is fine).
4. The installer starts the container and puts a **DupFinder** shortcut on your desktop.
**Day-to-day use:** double-click the desktop shortcut, or browse to http://localhost:8765.
**Uninstall:** run `installer\uninstall.ps1` as administrator.
---
### Debian / Ubuntu / Proxmox
**What you need:** Docker Engine. If you don't have it: `curl -fsSL https://get.docker.com | sh`.
```bash
# 1. Edit docker-compose.yml — set your photos volume path
# 2. Build and run
docker compose up -d --build
# 3. Open http://localhost:8765
# 4. Enter folder path in UI and click Scan
# 1. Install the package
wget http://192.168.1.64:3000/tocmo0nlord/-/packages/debian/dupfinder/1.0.0/files/amd64/dupfinder_1.0.0_amd64.deb
sudo apt install ./dupfinder_1.0.0_amd64.deb
# 2. Run first-time setup (asks for photos path + data path)
sudo dupfinder setup
# 3. Start it
sudo dupfinder start
```
## Volume mounts
**Manage the service:**
| Container path | Purpose |
| Command | What it does |
|---|---|
| `/photos` | Your photo library — mounted **read-only** |
| `/data` | SQLite database persistence |
| `sudo dupfinder start` | Start the container |
| `sudo dupfinder stop` | Stop the container |
| `sudo dupfinder restart` | Restart |
| `sudo dupfinder status` | Show systemd status |
| `sudo dupfinder logs` | Tail the logs |
| `dupfinder open` | Open in your default browser |
Edit `docker-compose.yml` to point these at your NAS paths.
The service auto-starts on boot via systemd (`dupfinder.service`).
## Detection methods
**Uninstall:** `sudo apt remove dupfinder` (your photos and database are left untouched).
| Method | Color | Description |
---
### Manual Docker Compose
For NAS appliances (Synology, Unraid, TrueNAS), Mac, or any host where you'd rather wire it up yourself.
1. Clone the repo:
```bash
git clone http://192.168.1.64:3000/tocmo0nlord/duplicate-finder.git
cd duplicate-finder
```
2. Open `docker-compose.yml` and change the two volume paths under `dup-finder:`:
```yaml
volumes:
- /your/photos/path:/photos:ro # ← your photo library (read-only)
- /your/data/path:/data # ← where the SQLite DB lives
```
3. Build and start:
```bash
docker compose up -d --build
```
4. Open http://localhost:8765.
To stop: `docker compose down`. To update later: `git pull && docker compose up -d --build`.
> **GPU acceleration (optional):** the compose file requests an NVIDIA GPU for faster perceptual hashing. If you don't have one, delete the `deploy.resources.reservations.devices` block — the app falls back to CPU automatically.
---
## Using it
1. Open http://localhost:8765.
2. Click **Browse** and pick the folder you want to scan (it's relative to the container — usually just `/photos`).
3. Pick a scan mode (see below) and click **Scan**.
4. When it finishes, review the duplicate groups. Each group shows the suggested keeper highlighted; click any other photo to pick it instead, or **Keep all** to skip the group.
5. When you're done, click **Download CSV** to export all decisions.
### Scan modes
| Mode | When to use |
|---|---|
| **Incremental** *(default)* | Day-to-day rescans. Re-hashes only changed/new files. Past review decisions are preserved. |
| **New files only** | Fastest option. Indexes only files added since the last scan. |
| **Rebuild groups** | Re-runs duplicate detection on the existing index without re-hashing. |
| **Full reset** | Wipes the entire index and starts from scratch. |
### Detection methods
| Method | UI color | What it catches |
|---|---|---|
| SHA-256 | Blue | Byte-identical files |
| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) |
| EXIF timestamp + device | Amber | Same camera, same moment |
| File size + dimensions | Gray | Same size and resolution (low confidence) |
| **SHA-256** | Blue | Byte-identical files |
| **Perceptual hash** | Purple | Visually similar photos (hamming ≤ 10) |
| **EXIF timestamp + device** | Amber | Same camera, same moment |
| **File size + dimensions** | Gray | Same size and resolution (low confidence) |
## Scan modes
### Google Takeout
| Mode | Description |
|---|---|
| Incremental | Only re-hashes changed/new files. Prior decisions preserved. |
| New files only | Indexes newly added files. Existing decisions untouched. |
| Rebuild groups | Re-runs detection on existing index. No re-hashing. |
| Full reset | Wipes everything and scans from scratch. |
Point it at a Google Photos Takeout export and it auto-detects the structure, reads the `.json` sidecars, and restores the correct capture timestamps and original filenames. Takeout files get a flag in the UI.
## Google Takeout
---
The scanner automatically detects Google Takeout folder structures and reads `.json` sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI.
## Troubleshooting
**The page won't load at http://localhost:8765**
Check the container is up: `docker ps | grep dup-finder`. If not, see the logs: `docker compose logs dup-finder` (or `sudo dupfinder logs` on Debian).
**"Permission denied" reading photos**
The `/photos` mount is read-only by design, but the container still needs read access. Make sure your user (or the docker daemon) can read the folder you mounted.
**Scan is stuck on "phash"**
Perceptual hashing is the slowest phase — large libraries (>50k photos) on CPU can take hours. Add an NVIDIA GPU and the `deploy.resources` block in compose to get a 10-50× speedup.
**I marked the wrong file as keeper**
Open the group again and click **Unreview**, then re-decide.
---
## What "redundant" means
Marking a file redundant **only writes to the database**. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions.
When you mark a file redundant, **only the database is updated**. Nothing on disk changes. This tool produces a decision record. A future companion tool will use that record to actually move or delete files.
---
## Tech stack
- Python 3.12, FastAPI, Uvicorn
- SQLite (stdlib `sqlite3`)
- Pillow, imagehash, pillow-heif
- PyTorch + CUDA for batched perceptual hashing
- Vanilla JS single-page frontend
- Docker / docker-compose