fix: raise Ollama timeout to 300s, add model pre-warming, improve health check

- OllamaBackend enforces _MIN_TIMEOUT=300s (overrides OLLAMA_TIMEOUT env var) - warm_model() background task loads activeblue-chat into VRAM at startup - health/detailed reports "warming" vs "ok" via Ollama ps() API - README updated with May 2026 changes and test coverage details
2026-05-20 05:03:15 +00:00
parent 20a69313d7
commit 564f1a9479
5 changed files with 72 additions and 6 deletions
--- a/agent_service/config.py
+++ b/agent_service/config.py
@@ -14,7 +14,7 @@ class Settings(BaseSettings):
    # Ollama
    ollama_url: str = 'http://localhost:11434'
    ollama_model: str = 'activeblue-chat'
-    ollama_timeout: int = 120
+    ollama_timeout: int = 300
    ollama_max_concurrent: int = 2
    # Set to a vision-capable model (e.g. llama3.2-vision:11b) to use
    # vision OCR for receipt images instead of Tesseract.  Leave empty