Scraper, indexer, and FastAPI query service for Retrieval-Augmented Generation over Odoo 18 documentation. Uses Qdrant + Ollama (nomic-embed-text + llama3.1). Integrates with ActiveBlue PeerBus agent interface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
128 lines
3.4 KiB
Markdown
128 lines
3.4 KiB
Markdown
# odoo18-rag
|
|
|
|
Retrieval-Augmented Generation over the full Odoo 18 documentation.
|
|
Built for the ActiveBlue AI agent stack.
|
|
|
|
## Stack
|
|
|
|
| Component | What it does |
|
|
|---|---|
|
|
| `scraper/` | Crawls odoo.com/documentation/18.0, outputs clean JSONL |
|
|
| `indexer/` | Chunks pages, embeds with `nomic-embed-text`, loads Qdrant |
|
|
| `api/` | FastAPI — `/ask`, `/ask/stream`, `/agent/ask`, `/health` |
|
|
| Qdrant | Vector database (Docker) |
|
|
| Ollama @ `miaai:11434` | Embeddings + generation (local, HIPAA-safe) |
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
# 1. Pull the embedding model on miaai
|
|
ollama pull nomic-embed-text
|
|
|
|
# 2. Start Qdrant + RAG API
|
|
docker compose up -d
|
|
|
|
# 3. Scrape the docs (~800 pages, ~20 min)
|
|
docker compose run --rm scraper
|
|
|
|
# 4. Index into Qdrant (~30-40 min)
|
|
docker compose run --rm indexer
|
|
|
|
# 5. Test
|
|
curl http://localhost:8000/health
|
|
curl -X POST http://localhost:8000/ask \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"question": "How do I run a payroll batch in Odoo 18?"}'
|
|
```
|
|
|
|
## Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|---|---|---|
|
|
| GET | `/health` | Qdrant + Ollama connectivity |
|
|
| GET | `/stats` | Vector count, models in use |
|
|
| GET | `/modules` | List indexed Odoo modules |
|
|
| POST | `/ask` | Blocking answer + sources |
|
|
| POST | `/ask/stream` | SSE token stream |
|
|
| POST | `/agent/ask` | ActiveBlue PeerBus integration |
|
|
|
|
### Ask with module filter
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/ask \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"question": "How do reordering rules work?", "module": "inventory"}'
|
|
```
|
|
|
|
### Streaming
|
|
|
|
```bash
|
|
curl -N -X POST http://localhost:8000/ask/stream \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"question": "Explain the Quote-to-Cash workflow"}'
|
|
```
|
|
|
|
## Agent integration
|
|
|
|
```python
|
|
from api.odoo_rag_agent import OdooRagAgent
|
|
|
|
agent = OdooRagAgent(rag_url="http://localhost:8000")
|
|
|
|
# Generic
|
|
result = await agent.ask("How do I configure NACHA payments?")
|
|
|
|
# Module-scoped
|
|
result = await agent.ask_payroll("How do I generate a payslip batch?")
|
|
result = await agent.ask_accounting("What is the chart of accounts?")
|
|
result = await agent.ask_inventory("How does MTO work?")
|
|
|
|
# Streaming
|
|
async for token in agent.ask_stream("Explain the CRM pipeline"):
|
|
print(token, end="", flush=True)
|
|
|
|
# PeerBus
|
|
response = await agent.handle_peer_message({
|
|
"action": "ask",
|
|
"payload": {"question": "How do I set up taxes?", "module": "accounting"},
|
|
"request_id": "req-001"
|
|
})
|
|
```
|
|
|
|
## Re-indexing
|
|
|
|
Odoo releases doc updates regularly. Re-index to stay current:
|
|
|
|
```bash
|
|
docker compose run --rm scraper
|
|
docker compose run --rm indexer python /app/indexer/indexer.py --reset
|
|
```
|
|
|
|
Or add a monthly cron on the host:
|
|
|
|
```cron
|
|
0 3 1 * * cd /opt/odoo18-rag && docker compose run --rm scraper && docker compose run --rm indexer python /app/indexer/indexer.py --reset
|
|
```
|
|
|
|
## Scraper options
|
|
|
|
```bash
|
|
# Single module only
|
|
docker compose run --rm scraper python /app/scraper/scraper.py --module accounting
|
|
|
|
# Quick test (first 50 pages)
|
|
docker compose run --rm scraper python /app/scraper/scraper.py --limit 50
|
|
```
|
|
|
|
## Environment variables
|
|
|
|
All configurable via `docker-compose.yml` environment section:
|
|
|
|
| Variable | Default | Description |
|
|
|---|---|---|
|
|
| `OLLAMA_URL` | `http://miaai:11434` | Ollama endpoint |
|
|
| `QDRANT_URL` | `http://qdrant:6333` | Qdrant endpoint |
|
|
| `EMBED_MODEL` | `nomic-embed-text` | Embedding model |
|
|
| `GEN_MODEL` | `llama3.1` | Generation model |
|
|
| `COLLECTION_NAME` | `odoo18_docs` | Qdrant collection |
|