fix: browser UA for scraper, Qdrant healthcheck endpoint

Scraper was using a bot User-Agent that triggered Cloudflare bot
detection, returning challenge pages with < 100 chars of content.
Switched to a standard Chrome UA with Accept headers.

Qdrant healthcheck used /healthz which does not exist in v1.9.0.
Changed to GET / which is always available. Added start_period: 30s
so the check does not fire before Qdrant has time to initialise.
Added --debug flag to scraper for future extraction diagnostics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Carlos Garcia
2026-05-14 11:57:34 -04:00
parent 7fb1573bac
commit 8fbf574634
2 changed files with 17 additions and 9 deletions

View File

@@ -29,10 +29,11 @@ services:
networks:
- rag_net
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:6333/healthz"]
interval: 30s
test: ["CMD", "curl", "-sf", "http://localhost:6333/"]
interval: 15s
timeout: 10s
retries: 3
retries: 5
start_period: 30s
rag-api:
build: .