fix: browser UA for scraper, Qdrant healthcheck endpoint
Scraper was using a bot User-Agent that triggered Cloudflare bot detection, returning challenge pages with < 100 chars of content. Switched to a standard Chrome UA with Accept headers. Qdrant healthcheck used /healthz which does not exist in v1.9.0. Changed to GET / which is always available. Added start_period: 30s so the check does not fire before Qdrant has time to initialise. Added --debug flag to scraper for future extraction diagnostics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -29,10 +29,11 @@ services:
|
||||
networks:
|
||||
- rag_net
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:6333/healthz"]
|
||||
interval: 30s
|
||||
test: ["CMD", "curl", "-sf", "http://localhost:6333/"]
|
||||
interval: 15s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
|
||||
rag-api:
|
||||
build: .
|
||||
|
||||
Reference in New Issue
Block a user