Routers calling /registry/agents raised AttributeError because
get_all() was not defined. Added method returning all registered
agents with active status, capabilities and instance flags.
The classifier was silently falling back to a clarification prompt every
time the LLM wrapped its JSON in markdown fences, prefixed it with
'json', or added surrounding prose. The bot then asked 'Could you
clarify what you need?' to every message regardless of clarity.
Now: strip code fences, slice to the first {...} block, and on parse
failure log the raw content (truncated) and treat the message as 'no
specialist agent' so the direct-answer fallback responds instead of
looping on clarification.
Previously when the LLM classified a message as needing no specialist
agent, the dispatcher built zero directives and _synthesize returned
'No agent responses received.' Greetings, follow-up clarifications,
and general questions all fell into this dead end.
Now when intent.agents is empty and no clarification is needed, the
master makes a second LLM call with the recent conversation as context
and answers directly. Updated master_system.txt to steer the classifier
toward agents=[] for chitchat instead of forcing a clarification loop.
The f-string only spanned the first fragment ('You don') so the
{chr(44).join(...)} placeholder leaked into chat output as literal
text. Build the message with plain string concat.
User messages were only saved inside _update_memory at the end of a
successful directive. The clarification and access-denied branches
returned early without ever calling it, so when a clarification turn
asked 'what do you mean?' and the user replied, the original question
was missing from context — the bot looked at a transcript of nothing
but its own clarifying questions and asked yet another.
Save the user message at the top of handle_message so every branch
includes it. Drop the now-duplicate write from _update_memory.
The prompt template contains a literal JSON example block ({"needs_clarification": ...})
which str.format() tried to interpret as format fields, raising KeyError on every
Discuss DM. Switch to .replace() so braces in the template are taken literally.
Without exc_info we only see the bare exception string, which has been
unhelpful for debugging Discuss DM failures (e.g. a KeyError whose
message is just a JSON key, with no clue where it was raised).
Odoo's bot model serialises user_id as a string (str(uid)) over the
HTTP boundary, but the asyncpg memory queries ($1) expect an integer.
This caused 'str object cannot be interpreted as an integer' on every
Discuss DM. Cast at the entry point so downstream stores get an int.
Fixes all errors reported in docker compose logs agent-service:
1. config.py: add ollama_max_concurrent, claude_timeout, claude_max_concurrent
fields so LLMRouter(config=settings) can read them without AttributeError.
2. main.py - LLM router: drop manual OllamaBackend/ClaudeBackend construction;
call LLMRouter(config=settings, pg_pool=pool) to match class signature.
Fixes: OllamaBackend.__init__() unexpected kwarg 'base_url'.
3. main.py - DB: add 5-attempt retry with 2s backoff and redacted DSN logging.
Fixes: connection refused race on startup before Postgres accepts connections.
4. main.py - AgentRegistry: call AgentRegistry() with no args (class takes none),
then await agent_registry.load_from_odoo(odoo) to populate active agents.
Fixes: AgentRegistry.__init__() unexpected kwarg 'odoo'.
5. main.py - PeerBus: pass registry=agent_registry at construction; register
specialist agents on agent_registry (not peer_bus, which has no register()).
peer_bus.py: make directive_id optional (default None) — bus is a singleton
at startup; directive_id is only needed per-request.
Fixes: PeerBus.__init__() missing positional args 'registry' and 'directive_id'.
6. main.py - MasterAgent: drop unexpected peer_bus= kwarg from constructor call.
Fixes: MasterAgent.__init__() unexpected kwarg 'peer_bus'.
7. mcp_router.py: pass NotificationOptions() instance instead of None.
Fixes: AttributeError 'NoneType' has no attribute 'tools_changed' (was applied
in running container but not committed; now committed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>