Full project per spec: FastAPI backend, 4-method duplicate detection (SHA-256, phash, EXIF, filesize), Google Takeout pre-processor, 4 scan modes, and dark-theme vanilla JS gallery frontend. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
57 lines
1.9 KiB
Markdown
57 lines
1.9 KiB
Markdown
# Duplicate Finder
|
|
|
|
A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. **No files are ever moved, renamed, or deleted** — all decisions are recorded in SQLite only.
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
# 1. Edit docker-compose.yml — set your photos volume path
|
|
# 2. Build and run
|
|
docker compose up -d --build
|
|
# 3. Open http://localhost:8765
|
|
# 4. Enter folder path in UI and click Scan
|
|
```
|
|
|
|
## Volume mounts
|
|
|
|
| Container path | Purpose |
|
|
|---|---|
|
|
| `/photos` | Your photo library — mounted **read-only** |
|
|
| `/data` | SQLite database persistence |
|
|
|
|
Edit `docker-compose.yml` to point these at your NAS paths.
|
|
|
|
## Detection methods
|
|
|
|
| Method | Color | Description |
|
|
|---|---|---|
|
|
| SHA-256 | Blue | Byte-identical files |
|
|
| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) |
|
|
| EXIF timestamp + device | Amber | Same camera, same moment |
|
|
| File size + dimensions | Gray | Same size and resolution (low confidence) |
|
|
|
|
## Scan modes
|
|
|
|
| Mode | Description |
|
|
|---|---|
|
|
| Incremental | Only re-hashes changed/new files. Prior decisions preserved. |
|
|
| New files only | Indexes newly added files. Existing decisions untouched. |
|
|
| Rebuild groups | Re-runs detection on existing index. No re-hashing. |
|
|
| Full reset | Wipes everything and scans from scratch. |
|
|
|
|
## Google Takeout
|
|
|
|
The scanner automatically detects Google Takeout folder structures and reads `.json` sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI.
|
|
|
|
## What "redundant" means
|
|
|
|
Marking a file redundant **only writes to the database**. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions.
|
|
|
|
## Tech stack
|
|
|
|
- Python 3.12, FastAPI, Uvicorn
|
|
- SQLite (stdlib `sqlite3`)
|
|
- Pillow, imagehash, pillow-heif
|
|
- Vanilla JS single-page frontend
|
|
- Docker / docker-compose
|