GPU-accelerated phash + fix discovery/takeout hang

GPU:
- Switch Dockerfile base to pytorch/pytorch:2.3.1-cuda12.1-cudnn8-runtime
- Add gpu_hasher.py: batched 2D DCT on GPU via PyTorch matrix multiply,
  256 images/batch, produces imagehash-compatible 64-bit hex hashes,
  auto-falls back to CPU when CUDA unavailable
- Replace per-image phash loop in scanner.py with phasher.hash_files()
- docker-compose.yml: add nvidia GPU device reservation

Hang fix:
- takeout.is_takeout_folder() now caps at 50 directories (was walking
  entire tree — blocked for minutes on 65k+ file libraries)
- Add "Not a Takeout folder" status message so takeout phase is never silent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
tocmo
2026-04-05 01:37:28 -04:00
parent 1d46b9945d
commit c110a8e4f9
6 changed files with 222 additions and 20 deletions

View File

@@ -50,14 +50,19 @@ def is_takeout_folder(folder_path: str) -> bool:
adjacent media files. If we find at least 5 such pairs, call it Takeout.
"""
count = 0
dirs_checked = 0
MAX_DIRS = 50 # sample at most 50 directories — fast on any library size
for root, dirs, files in os.walk(folder_path):
# Skip hidden dirs
dirs[:] = [d for d in dirs if not d.startswith(".")]
dirs_checked += 1
if dirs_checked > MAX_DIRS:
break
file_set = set(files)
for f in files:
if not f.endswith(".json"):
continue
# Check if a media file exists that this could be a sidecar for
base = f[:-5] # strip .json
if base in file_set:
count += 1