Files
duplicate-finder/README.md
tocmo 868da9016d Initial implementation of duplicate finder
Full project per spec: FastAPI backend, 4-method duplicate detection
(SHA-256, phash, EXIF, filesize), Google Takeout pre-processor,
4 scan modes, and dark-theme vanilla JS gallery frontend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 23:42:58 -04:00

57 lines
1.9 KiB
Markdown

# Duplicate Finder
A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. **No files are ever moved, renamed, or deleted** — all decisions are recorded in SQLite only.
## Quick start
```bash
# 1. Edit docker-compose.yml — set your photos volume path
# 2. Build and run
docker compose up -d --build
# 3. Open http://localhost:8765
# 4. Enter folder path in UI and click Scan
```
## Volume mounts
| Container path | Purpose |
|---|---|
| `/photos` | Your photo library — mounted **read-only** |
| `/data` | SQLite database persistence |
Edit `docker-compose.yml` to point these at your NAS paths.
## Detection methods
| Method | Color | Description |
|---|---|---|
| SHA-256 | Blue | Byte-identical files |
| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) |
| EXIF timestamp + device | Amber | Same camera, same moment |
| File size + dimensions | Gray | Same size and resolution (low confidence) |
## Scan modes
| Mode | Description |
|---|---|
| Incremental | Only re-hashes changed/new files. Prior decisions preserved. |
| New files only | Indexes newly added files. Existing decisions untouched. |
| Rebuild groups | Re-runs detection on existing index. No re-hashing. |
| Full reset | Wipes everything and scans from scratch. |
## Google Takeout
The scanner automatically detects Google Takeout folder structures and reads `.json` sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI.
## What "redundant" means
Marking a file redundant **only writes to the database**. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions.
## Tech stack
- Python 3.12, FastAPI, Uvicorn
- SQLite (stdlib `sqlite3`)
- Pillow, imagehash, pillow-heif
- Vanilla JS single-page frontend
- Docker / docker-compose