Raise model num_ctx to 8192 to fix mid-call silence
Long calls overflowed the 4096-token window mid-conversation, forcing Ollama to truncate + re-evaluate the full context each turn = multi-second stalls / dead air. Rebuilt activeblue-avc:latest with num_ctx 8192 (rollback tag activeblue-avc:pre-ctx8k). Combined with removing the 45-day calendar injection, this keeps long calls well under the window. Doc: context row, Modelfile reference, and a root-cause note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
14
CLAUDE.md
14
CLAUDE.md
@@ -272,7 +272,7 @@ everything is a request staff confirm.
|
||||
| ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 |
|
||||
| Size | 4.9GB | Down from 8.7GB Q8_0 |
|
||||
| VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 |
|
||||
| Context | 4096 tokens | Sufficient for any phone call |
|
||||
| Context | 8192 tokens | Raised from 4096 (2026-06-25) so long calls don't overflow mid-call — see note below |
|
||||
| Temperature | 0.3 | Low — maximizes JSON schema compliance |
|
||||
| Top-p | 0.9 | Standard |
|
||||
| Adapter | None | 44-pair LoRA adapter discarded |
|
||||
@@ -285,7 +285,7 @@ FROM llama3.1:8b-instruct-q4_K_M
|
||||
PARAMETER stop "<|start_header_id|>"
|
||||
PARAMETER stop "<|end_header_id|>"
|
||||
PARAMETER stop "<|eot_id|>"
|
||||
PARAMETER num_ctx 4096
|
||||
PARAMETER num_ctx 8192
|
||||
PARAMETER temperature 0.3
|
||||
PARAMETER top_p 0.9
|
||||
|
||||
@@ -295,6 +295,16 @@ TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
|
||||
"
|
||||
```
|
||||
|
||||
### Why num_ctx 8192 (was 4096) — fixes mid-call silence
|
||||
|
||||
Symptom: on longer calls AVA would go silent / stop replying partway through. Cause: the
|
||||
system prompt + a growing multi-turn transcript exceeded the 4096-token window mid-call, so
|
||||
Ollama truncated and re-evaluated the whole context every turn (cache miss) → multi-second
|
||||
stalls = dead air. The capture changes made it worse by briefly injecting a 45-day calendar
|
||||
(~600 tok/turn) — that injection was removed; raising num_ctx to 8192 gives long calls real
|
||||
headroom (RTX 5080 has the VRAM). Rebuild keeps the previous model as `activeblue-avc:pre-ctx8k`
|
||||
for rollback. Keep the live system prompt lean for the same reason.
|
||||
|
||||
### Why Q4_K_M not Q8_0
|
||||
|
||||
Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused
|
||||
|
||||
Reference in New Issue
Block a user