Raise model num_ctx to 8192 to fix mid-call silence

Long calls overflowed the 4096-token window mid-conversation, forcing Ollama to
truncate + re-evaluate the full context each turn = multi-second stalls / dead
air. Rebuilt activeblue-avc:latest with num_ctx 8192 (rollback tag
activeblue-avc:pre-ctx8k). Combined with removing the 45-day calendar injection,
this keeps long calls well under the window. Doc: context row, Modelfile
reference, and a root-cause note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
tocmo0nlord
2026-06-25 03:53:41 +00:00
parent 08d9db4f09
commit b31f685d91

View File

@@ -272,7 +272,7 @@ everything is a request staff confirm.
| ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 |
| Size | 4.9GB | Down from 8.7GB Q8_0 |
| VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 |
| Context | 4096 tokens | Sufficient for any phone call |
| Context | 8192 tokens | Raised from 4096 (2026-06-25) so long calls don't overflow mid-call — see note below |
| Temperature | 0.3 | Low — maximizes JSON schema compliance |
| Top-p | 0.9 | Standard |
| Adapter | None | 44-pair LoRA adapter discarded |
@@ -285,7 +285,7 @@ FROM llama3.1:8b-instruct-q4_K_M
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER num_ctx 4096
PARAMETER num_ctx 8192
PARAMETER temperature 0.3
PARAMETER top_p 0.9
@@ -295,6 +295,16 @@ TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
"
```
### Why num_ctx 8192 (was 4096) — fixes mid-call silence
Symptom: on longer calls AVA would go silent / stop replying partway through. Cause: the
system prompt + a growing multi-turn transcript exceeded the 4096-token window mid-call, so
Ollama truncated and re-evaluated the whole context every turn (cache miss) → multi-second
stalls = dead air. The capture changes made it worse by briefly injecting a 45-day calendar
(~600 tok/turn) — that injection was removed; raising num_ctx to 8192 gives long calls real
headroom (RTX 5080 has the VRAM). Rebuild keeps the previous model as `activeblue-avc:pre-ctx8k`
for rollback. Keep the live system prompt lean for the same reason.
### Why Q4_K_M not Q8_0
Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused