Runtime Reuse
TemporalStore makes prefix and KV-cache reuse context-aware.
LMCache-style systems and remote KV-cache layers can reuse model-runtime state. They know tokens, prefixes, blocks, and cache placement. TemporalStore knows application context: what changed, what expired, what the user can access, which sources are valid, which prompt sections are stable, and which memories must be refreshed before reuse.
Important boundary
TemporalStore is not a transformer KV-cache engine. It does not own GPU attention blocks, token-position layout, paged attention, prefix matching inside the runtime, or vLLM/SGLang scheduler internals. Its job is higher in the stack: decide which context sections are safe to reuse, which ones invalidate a prefix, and which metadata should be passed to LMCache or another remote cache layer.
That boundary matters commercially. Runtime cache vendors can improve prefill cost and latency, but enterprise AI teams still need a source of truth for whether reused context is correct, fresh, permissioned, and auditable. TemporalStore supplies that control plane.
LMCache owns
Model-runtime prefix reuse, KV-cache block movement, cache lookup, offload, and runtime-level cache hit behavior.
TemporalStore owns
Time-aware context, source freshness, memory validity, prompt-section stability, replay, and invalidation signals.
MatrixDB can own
Redis-compatible hot metadata: cache keys, session summaries, retrieval result lists, TTL state, and LMCache metadata at large scale.
MatrixKV can own
Low-volume truth: permissions, document versions, leases, approvals, and committed state that must be correct before cache reuse.
Example: legal matter prompt reuse
A legal copilot may reuse the same drafting instructions and citation format, but every matter has changing facts: redlines, approved clauses, client instructions, and document versions. Generic prefix reuse can accidentally preserve stale matter context. MatrixArk splits the prompt into reusable and invalidating sections.
Legal prompt cache plan:
stable_prefix:
- firm drafting policy v12
- citation format v4
- tool contract v7
matter_context:
- agreement version = msa_v18
- latest redline = redline_20260616_0944
- client instruction = "do not accept uncapped liability"
- valid_until = next_document_update
invalidation event:
type = document_version_changed
old = msa_v18
new = msa_v19
action = reuse stable prefix, rebuild matter context
Example: stable system prefix, changing user context
A support copilot may reuse the same product instructions and policy framing across many requests, but the customer timeline, open promise, entitlement, and permission state can change every minute. A generic remote KV-cache sees similar tokens; MatrixArk sees whether the underlying context is still valid.
Prompt sections:
S0_system_policy
cache_policy = stable
source_version = policy:v42
S1_tool_contract
cache_policy = stable
source_version = tools:v9
S2_customer_context
cache_policy = time_sensitive
context_pack_id = ctxpack_acme_20260616_1015
valid_until = 2026-06-16T10:20:00Z
invalidates_on = [ticket_update, refund_approval, permission_change]
S3_latest_timeline
cache_policy = volatile
max_age_seconds = 30
LMCache can reuse the runtime state for stable sections. TemporalStore says whether
S2_customer_context and S3_latest_timeline must be regenerated,
and records why that decision was made.
How a request flows
tenant, session, entity, task MatrixArk planner
sections, versions, freshness, permissions TemporalStore
timeline, memory, validity, replay
hot cache metadata, retrieval lists, TTL keys LMCache
prefix/KV-cache lookup and reuse LLM runtime
prefill, decode, response
Where the Python end-to-end path plugs in
The latest MatrixArk Python runtime already produces the metadata a cache-aware serving layer needs. Retrieval extracts intent, time window, filters, and scope; stores query embeddings; searches TemporalStore-owned summary and chunk embeddings; filters events and resource chunks; builds a token-budgeted context pack; and writes a replay audit with selected records and decision notes.
ContextPackAudit:
selected_event_ids = [...]
returned_tokens = 742
decision_notes = [
"extraction stage converted raw query into intent, time window, filters, and scope",
"recall stage searched TemporalStore-owned summary and chunk embeddings",
"filter stage enforced TemporalStore time and metadata filters",
"pack stage enforced token budget and stored replay metadata"
]
That audit record is exactly what a runtime-cache policy needs. LMCache can reuse the model prefix or KV blocks; MatrixArk can explain which context was selected, which sections were stable, which chunks consumed tokens, and which source updates should invalidate reuse.
Why this is better than generic remote KV-cache alone
A remote KV-cache is powerful when the repeated prefix is truly safe. The hard part is deciding safety. In production agents, the same token-looking prefix can be unsafe because a source document changed, a permission was revoked, a customer promise was fulfilled, or a memory was superseded. TemporalStore makes those invalidation reasons first-class.
| Generic remote KV-cache | TemporalStore + LMCache policy |
|---|---|
| Optimizes token/runtime reuse. | Optimizes reuse only after checking context freshness, permissions, source versions, and open commitments. |
| Cache key is often token prefix, model, tenant, or request shape. | Cache key includes context-pack id, source versions, valid-time window, policy version, and section stability. |
| Invalidation can be coarse: TTL, manual delete, prefix miss. | Invalidation can be semantic: ticket changed, approval expired, document version changed, memory superseded. |
| Hard to explain why a prefix was reused. | Replay audit can show selected sections, blocked sections, freshness checks, and cache-policy decision notes. |
| Risk of mixing durable app memory with runtime cache mechanics. | Runtime cache stays runtime cache; TemporalStore remains durable application memory and policy state. |
Cache-policy record
TemporalStore can write cache-policy events beside context-pack audits. MatrixDB can keep the hot metadata and Redis-compatible access pattern for LMCache integration, while TemporalStore remains the replayable source of why the cache decision was safe.
CachePolicyEvent {
tenant_id: "acme",
session_id: "support_session_77",
context_pack_id: "ctxpack_8f91",
model: "llm-prod",
prompt_sections: [
{ id: "S0_system_policy", reuse: "allow", source_version: "policy:v42" },
{ id: "S1_tool_contract", reuse: "allow", source_version: "tools:v9" },
{ id: "S2_customer_context", reuse: "deny", reason: "ticket_update_after_pack" },
{ id: "S3_latest_timeline", reuse: "deny", reason: "volatile_window_expired" }
],
lmcache_hint: {
reusable_prefix_until_section: "S1_tool_contract",
invalidate_after_section: "S1_tool_contract",
cache_namespace: "tenant:acme:model:llm-prod"
}
}
Where it helps most
- Support copilots that reuse product instructions but refresh customer timelines and promises.
- Legal copilots that reuse matter templates but invalidate source sections when document versions change.
- Security agents that reuse playbook prefixes but refresh incident timelines and containment state.
- Enterprise RAG where stable policy text can be cached, but permissions and source freshness must be checked per request.
- Vertical AI products that need lower prefill cost without risking stale memory or unauthorized context reuse.
Why this is stronger for production LLM context
Token savings with guardrails
Reuse stable instructions and source packs while refreshing volatile context before it corrupts the answer.
Better final quality
Keep stale memories, revoked permissions, and superseded source versions out of reused prefixes.
Clear ownership
LMCache owns runtime blocks; TemporalStore owns context validity; MatrixDB can serve hot metadata; MatrixKV can protect truth.
Auditable reuse
Replay why a prefix was reused, why a section was rebuilt, and which event invalidated the previous cache plan.
The product message
TemporalStore makes runtime reuse safer because it adds application meaning to cache decisions. LMCache can answer "can this prefix/KV state be reused technically?" TemporalStore helps MatrixArk answer "should this context be reused for this user, at this time, with these sources, permissions, and memories?"