diff --git a/CLAUDE.md b/CLAUDE.md
index e23bc76..25d905c 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -272,7 +272,7 @@ everything is a request staff confirm.
 | ID | `366a6cc15bb7` | Rebuilt clean 2026-06-23 |
 | Size | 4.9GB | Down from 8.7GB Q8_0 |
 | VRAM usage | ~4.5GB | Leaves 11.5GB headroom on RTX 5080 |
-| Context | 4096 tokens | Sufficient for any phone call |
+| Context | 8192 tokens | Raised from 4096 (2026-06-25) so long calls don't overflow mid-call — see note below |
 | Temperature | 0.3 | Low — maximizes JSON schema compliance |
 | Top-p | 0.9 | Standard |
 | Adapter | None | 44-pair LoRA adapter discarded |
@@ -285,7 +285,7 @@ FROM llama3.1:8b-instruct-q4_K_M
 PARAMETER stop "<|start_header_id|>"
 PARAMETER stop "<|end_header_id|>"
 PARAMETER stop "<|eot_id|>"
-PARAMETER num_ctx 4096
+PARAMETER num_ctx 8192
 PARAMETER temperature 0.3
 PARAMETER top_p 0.9
 
@@ -295,6 +295,16 @@ TEMPLATE "{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
 "
 ```
 
+### Why num_ctx 8192 (was 4096) — fixes mid-call silence
+
+Symptom: on longer calls AVA would go silent / stop replying partway through. Cause: the
+system prompt + a growing multi-turn transcript exceeded the 4096-token window mid-call, so
+Ollama truncated and re-evaluated the whole context every turn (cache miss) → multi-second
+stalls = dead air. The capture changes made it worse by briefly injecting a 45-day calendar
+(~600 tok/turn) — that injection was removed; raising num_ctx to 8192 gives long calls real
+headroom (RTX 5080 has the VRAM). Rebuild keeps the previous model as `activeblue-avc:pre-ctx8k`
+for rollback. Keep the live system prompt lean for the same reason.
+
 ### Why Q4_K_M not Q8_0
 
 Q8_0 consumed ~8.5GB VRAM for weights alone. Under telephony load this caused