The Odoo 18 sitemap.xml returns 404. The fallback URL list also failed because urljoin(BASE_URL, /applications/...) strips the /documentation/18.0 path (absolute path arg replaces the whole path component in urljoin). Changes: - Add discover_urls_by_crawl(): fetches each module index page and collects all internal links — replaces sitemap as primary source - crawl() now chains: sitemap → crawl discovery → hardcoded fallback - Fix fallback_urls() to use BASE_URL + path (not urljoin) and trim the list to known-good pages - Keep crawl discovery rate-limited (0.5s between module seeds) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
12 KiB