odootrain

5 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
Carlos Garcia	608bb51943	fix: replace dead sitemap with crawl-based URL discovery The Odoo 18 sitemap.xml returns 404. The fallback URL list also failed because urljoin(BASE_URL, /applications/...) strips the /documentation/18.0 path (absolute path arg replaces the whole path component in urljoin). Changes: - Add discover_urls_by_crawl(): fetches each module index page and collects all internal links — replaces sitemap as primary source - crawl() now chains: sitemap → crawl discovery → hardcoded fallback - Fix fallback_urls() to use BASE_URL + path (not urljoin) and trim the list to known-good pages - Keep crawl discovery rate-limited (0.5s between module seeds) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 13:05:40 -04:00
Carlos Garcia	3d94c4eb25	fix: use list-form command with literal block to avoid sh syntax error YAML folded scalar (>) preserves newlines on more-indented continuation lines, so the shell received && on its own line which is invalid in dash. Switched to the list form [/bin/sh, -c, \|script\|] so Docker passes the script verbatim to sh -c without double-wrapping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 12:20:08 -04:00
Carlos Garcia	de051fb2e7	fix: remove qdrant healthcheck, use wait-loop in our own containers qdrant/qdrant:v1.9.0 does not ship curl or wget, so CMD healthchecks always exit 127 (not found) and the container is immediately marked unhealthy regardless of whether Qdrant is actually running. Fix: remove the healthcheck from the qdrant service entirely. Instead, rag-api and indexer now loop on `curl http://qdrant:6333/` (curl is installed in our own python:3.11-slim image via the Dockerfile) before starting the main process. Also removes the obsolete `version` key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 12:16:48 -04:00
Carlos Garcia	8fbf574634	fix: browser UA for scraper, Qdrant healthcheck endpoint Scraper was using a bot User-Agent that triggered Cloudflare bot detection, returning challenge pages with < 100 chars of content. Switched to a standard Chrome UA with Accept headers. Qdrant healthcheck used /healthz which does not exist in v1.9.0. Changed to GET / which is always available. Added start_period: 30s so the check does not fire before Qdrant has time to initialise. Added --debug flag to scraper for future extraction diagnostics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:57:34 -04:00
Carlos Garcia	7fb1573bac	Initial commit: Odoo 18 RAG stack Scraper, indexer, and FastAPI query service for Retrieval-Augmented Generation over Odoo 18 documentation. Uses Qdrant + Ollama (nomic-embed-text + llama3.1). Integrates with ActiveBlue PeerBus agent interface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:25:55 -04:00