Files
odootrain/README.md
Carlos Garcia 7fb1573bac Initial commit: Odoo 18 RAG stack
Scraper, indexer, and FastAPI query service for Retrieval-Augmented
Generation over Odoo 18 documentation. Uses Qdrant + Ollama (nomic-embed-text
+ llama3.1). Integrates with ActiveBlue PeerBus agent interface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 11:25:55 -04:00

128 lines
3.4 KiB
Markdown

# odoo18-rag
Retrieval-Augmented Generation over the full Odoo 18 documentation.
Built for the ActiveBlue AI agent stack.
## Stack
| Component | What it does |
|---|---|
| `scraper/` | Crawls odoo.com/documentation/18.0, outputs clean JSONL |
| `indexer/` | Chunks pages, embeds with `nomic-embed-text`, loads Qdrant |
| `api/` | FastAPI — `/ask`, `/ask/stream`, `/agent/ask`, `/health` |
| Qdrant | Vector database (Docker) |
| Ollama @ `miaai:11434` | Embeddings + generation (local, HIPAA-safe) |
## Quick start
```bash
# 1. Pull the embedding model on miaai
ollama pull nomic-embed-text
# 2. Start Qdrant + RAG API
docker compose up -d
# 3. Scrape the docs (~800 pages, ~20 min)
docker compose run --rm scraper
# 4. Index into Qdrant (~30-40 min)
docker compose run --rm indexer
# 5. Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I run a payroll batch in Odoo 18?"}'
```
## Endpoints
| Method | Path | Description |
|---|---|---|
| GET | `/health` | Qdrant + Ollama connectivity |
| GET | `/stats` | Vector count, models in use |
| GET | `/modules` | List indexed Odoo modules |
| POST | `/ask` | Blocking answer + sources |
| POST | `/ask/stream` | SSE token stream |
| POST | `/agent/ask` | ActiveBlue PeerBus integration |
### Ask with module filter
```bash
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do reordering rules work?", "module": "inventory"}'
```
### Streaming
```bash
curl -N -X POST http://localhost:8000/ask/stream \
-H "Content-Type: application/json" \
-d '{"question": "Explain the Quote-to-Cash workflow"}'
```
## Agent integration
```python
from api.odoo_rag_agent import OdooRagAgent
agent = OdooRagAgent(rag_url="http://localhost:8000")
# Generic
result = await agent.ask("How do I configure NACHA payments?")
# Module-scoped
result = await agent.ask_payroll("How do I generate a payslip batch?")
result = await agent.ask_accounting("What is the chart of accounts?")
result = await agent.ask_inventory("How does MTO work?")
# Streaming
async for token in agent.ask_stream("Explain the CRM pipeline"):
print(token, end="", flush=True)
# PeerBus
response = await agent.handle_peer_message({
"action": "ask",
"payload": {"question": "How do I set up taxes?", "module": "accounting"},
"request_id": "req-001"
})
```
## Re-indexing
Odoo releases doc updates regularly. Re-index to stay current:
```bash
docker compose run --rm scraper
docker compose run --rm indexer python /app/indexer/indexer.py --reset
```
Or add a monthly cron on the host:
```cron
0 3 1 * * cd /opt/odoo18-rag && docker compose run --rm scraper && docker compose run --rm indexer python /app/indexer/indexer.py --reset
```
## Scraper options
```bash
# Single module only
docker compose run --rm scraper python /app/scraper/scraper.py --module accounting
# Quick test (first 50 pages)
docker compose run --rm scraper python /app/scraper/scraper.py --limit 50
```
## Environment variables
All configurable via `docker-compose.yml` environment section:
| Variable | Default | Description |
|---|---|---|
| `OLLAMA_URL` | `http://miaai:11434` | Ollama endpoint |
| `QDRANT_URL` | `http://qdrant:6333` | Qdrant endpoint |
| `EMBED_MODEL` | `nomic-embed-text` | Embedding model |
| `GEN_MODEL` | `llama3.1` | Generation model |
| `COLLECTION_NAME` | `odoo18_docs` | Qdrant collection |