Initial implementation of duplicate finder
Full project per spec: FastAPI backend, 4-method duplicate detection (SHA-256, phash, EXIF, filesize), Google Takeout pre-processor, 4 scan modes, and dark-theme vanilla JS gallery frontend. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
56
README.md
Normal file
56
README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Duplicate Finder
|
||||
|
||||
A self-hosted Docker web app that scans a photo/video library, detects duplicates using four methods, and lets you review them in a gallery UI. **No files are ever moved, renamed, or deleted** — all decisions are recorded in SQLite only.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# 1. Edit docker-compose.yml — set your photos volume path
|
||||
# 2. Build and run
|
||||
docker compose up -d --build
|
||||
# 3. Open http://localhost:8765
|
||||
# 4. Enter folder path in UI and click Scan
|
||||
```
|
||||
|
||||
## Volume mounts
|
||||
|
||||
| Container path | Purpose |
|
||||
|---|---|
|
||||
| `/photos` | Your photo library — mounted **read-only** |
|
||||
| `/data` | SQLite database persistence |
|
||||
|
||||
Edit `docker-compose.yml` to point these at your NAS paths.
|
||||
|
||||
## Detection methods
|
||||
|
||||
| Method | Color | Description |
|
||||
|---|---|---|
|
||||
| SHA-256 | Blue | Byte-identical files |
|
||||
| Perceptual hash | Purple | Visually similar photos (hamming ≤ 10) |
|
||||
| EXIF timestamp + device | Amber | Same camera, same moment |
|
||||
| File size + dimensions | Gray | Same size and resolution (low confidence) |
|
||||
|
||||
## Scan modes
|
||||
|
||||
| Mode | Description |
|
||||
|---|---|
|
||||
| Incremental | Only re-hashes changed/new files. Prior decisions preserved. |
|
||||
| New files only | Indexes newly added files. Existing decisions untouched. |
|
||||
| Rebuild groups | Re-runs detection on existing index. No re-hashing. |
|
||||
| Full reset | Wipes everything and scans from scratch. |
|
||||
|
||||
## Google Takeout
|
||||
|
||||
The scanner automatically detects Google Takeout folder structures and reads `.json` sidecar files to restore correct capture timestamps and original filenames. Takeout files are flagged in the UI.
|
||||
|
||||
## What "redundant" means
|
||||
|
||||
Marking a file redundant **only writes to the database**. Nothing is moved, renamed, or deleted. This tool produces a decision record only. A separate tool handles file actions.
|
||||
|
||||
## Tech stack
|
||||
|
||||
- Python 3.12, FastAPI, Uvicorn
|
||||
- SQLite (stdlib `sqlite3`)
|
||||
- Pillow, imagehash, pillow-heif
|
||||
- Vanilla JS single-page frontend
|
||||
- Docker / docker-compose
|
||||
Reference in New Issue
Block a user