duplicate-finder/app/scanner.py at fef364162c104f8742d1dcd011ea969f26a4aea6

Files

tocmo fef364162c Parallel SHA-256 indexing with thread pool

Replace single-threaded indexing loop with ThreadPoolExecutor.
Default workers = min(cpu_count*2, 16), tunable via DUPFINDER_WORKERS
env var. Pre-loads all existing DB records in one query instead of
N per-file queries. Progress message shows worker count and live
done/total count. Skipped files bulk-stamped in batches of 500.

On an 8-core machine over NAS: ~4-8x faster indexing phase.
On NVMe: up to 16x faster with 16 workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 01:48:30 -04:00

30 KiB

Raw Blame History

View Raw

30 KiB Raw Blame History

30 KiB

Raw Blame History