TemporalStore for Prefix Reuse and LMCache Policy

Important boundary

TemporalStore is not a transformer KV-cache engine. It does not own GPU attention blocks, token-position layout, paged attention, prefix matching inside the runtime, or vLLM/SGLang scheduler internals. Its job is higher in the stack: decide which context sections are safe to reuse, which ones invalidate a prefix, and which metadata should be passed to LMCache or another remote cache layer.

That boundary matters commercially. Runtime cache vendors can improve prefill cost and latency, but enterprise AI teams still need a source of truth for whether reused context is correct, fresh, permissioned, and auditable. TemporalStore supplies that control plane.

LMCache owns

Model-runtime prefix reuse, KV-cache block movement, cache lookup, offload, and runtime-level cache hit behavior.

TemporalStore owns

Time-aware context, source freshness, memory validity, prompt-section stability, replay, and invalidation signals.

MatrixDB can own

Redis-compatible hot metadata: cache keys, session summaries, retrieval result lists, TTL state, and LMCache metadata at large scale.

MatrixKV can own

Low-volume truth: permissions, document versions, leases, approvals, and committed state that must be correct before cache reuse.

Example: legal matter prompt reuse

A legal copilot may reuse the same drafting instructions and citation format, but every matter has changing facts: redlines, approved clauses, client instructions, and document versions. Generic prefix reuse can accidentally preserve stale matter context. MatrixArk splits the prompt into reusable and invalidating sections.

Legal prompt cache plan:
  stable_prefix:
    - firm drafting policy v12
    - citation format v4
    - tool contract v7

  matter_context:
    - agreement version = msa_v18
    - latest redline = redline_20260616_0944
    - client instruction = "do not accept uncapped liability"
    - valid_until = next_document_update

  invalidation event:
    type = document_version_changed
    old = msa_v18
    new = msa_v19
    action = reuse stable prefix, rebuild matter context

Example: stable system prefix, changing user context

A support copilot may reuse the same product instructions and policy framing across many requests, but the customer timeline, open promise, entitlement, and permission state can change every minute. A generic remote KV-cache sees similar tokens; MatrixArk sees whether the underlying context is still valid.

Prompt sections:
  S0_system_policy
    cache_policy = stable
    source_version = policy:v42

  S1_tool_contract
    cache_policy = stable
    source_version = tools:v9

  S2_customer_context
    cache_policy = time_sensitive
    context_pack_id = ctxpack_acme_20260616_1015
    valid_until = 2026-06-16T10:20:00Z
    invalidates_on = [ticket_update, refund_approval, permission_change]

  S3_latest_timeline
    cache_policy = volatile
    max_age_seconds = 30

LMCache can reuse the runtime state for stable sections. TemporalStore says whether S2_customer_context and S3_latest_timeline must be regenerated, and records why that decision was made.

How a request flows

User request
tenant, session, entity, task MatrixArk planner
sections, versions, freshness, permissions TemporalStore
timeline, memory, validity, replay

MatrixDB
hot cache metadata, retrieval lists, TTL keys LMCache
prefix/KV-cache lookup and reuse LLM runtime
prefill, decode, response

1. App asks for context pack 2. TemporalStore checks freshness and source versions 3. MatrixArk labels prompt sections as stable, time-sensitive, or volatile 4. MatrixDB stores hot cache metadata and invalidation hints 5. LMCache reuses safe prefix/KV-cache state 6. Final answer writes feedback and cache invalidation events back

Where the Python end-to-end path plugs in

The latest MatrixArk Python runtime already produces the metadata a cache-aware serving layer needs. Retrieval extracts intent, time window, filters, and scope; stores query embeddings; searches TemporalStore-owned summary and chunk embeddings; filters events and resource chunks; builds a token-budgeted context pack; and writes a replay audit with selected records and decision notes.

ContextPackAudit:
  selected_event_ids = [...]
  returned_tokens = 742
  decision_notes = [
    "extraction stage converted raw query into intent, time window, filters, and scope",
    "recall stage searched TemporalStore-owned summary and chunk embeddings",
    "filter stage enforced TemporalStore time and metadata filters",
    "pack stage enforced token budget and stored replay metadata"
  ]

That audit record is exactly what a runtime-cache policy needs. LMCache can reuse the model prefix or KV blocks; MatrixArk can explain which context was selected, which sections were stable, which chunks consumed tokens, and which source updates should invalidate reuse.

Why this is better than generic remote KV-cache alone

A remote KV-cache is powerful when the repeated prefix is truly safe. The hard part is deciding safety. In production agents, the same token-looking prefix can be unsafe because a source document changed, a permission was revoked, a customer promise was fulfilled, or a memory was superseded. TemporalStore makes those invalidation reasons first-class.

Generic remote KV-cache	TemporalStore + LMCache policy
Optimizes token/runtime reuse.	Optimizes reuse only after checking context freshness, permissions, source versions, and open commitments.
Cache key is often token prefix, model, tenant, or request shape.	Cache key includes context-pack id, source versions, valid-time window, policy version, and section stability.
Invalidation can be coarse: TTL, manual delete, prefix miss.	Invalidation can be semantic: ticket changed, approval expired, document version changed, memory superseded.
Hard to explain why a prefix was reused.	Replay audit can show selected sections, blocked sections, freshness checks, and cache-policy decision notes.
Risk of mixing durable app memory with runtime cache mechanics.	Runtime cache stays runtime cache; TemporalStore remains durable application memory and policy state.

Cache-policy record

TemporalStore can write cache-policy events beside context-pack audits. MatrixDB can keep the hot metadata and Redis-compatible access pattern for LMCache integration, while TemporalStore remains the replayable source of why the cache decision was safe.

CachePolicyEvent {
  tenant_id: "acme",
  session_id: "support_session_77",
  context_pack_id: "ctxpack_8f91",
  model: "llm-prod",
  prompt_sections: [
    { id: "S0_system_policy", reuse: "allow", source_version: "policy:v42" },
    { id: "S1_tool_contract", reuse: "allow", source_version: "tools:v9" },
    { id: "S2_customer_context", reuse: "deny", reason: "ticket_update_after_pack" },
    { id: "S3_latest_timeline", reuse: "deny", reason: "volatile_window_expired" }
  ],
  lmcache_hint: {
    reusable_prefix_until_section: "S1_tool_contract",
    invalidate_after_section: "S1_tool_contract",
    cache_namespace: "tenant:acme:model:llm-prod"
  }
}

Where it helps most

Support copilots that reuse product instructions but refresh customer timelines and promises.
Legal copilots that reuse matter templates but invalidate source sections when document versions change.
Security agents that reuse playbook prefixes but refresh incident timelines and containment state.
Enterprise RAG where stable policy text can be cached, but permissions and source freshness must be checked per request.
Vertical AI products that need lower prefill cost without risking stale memory or unauthorized context reuse.

Why this is stronger for production LLM context

Token savings with guardrails

Reuse stable instructions and source packs while refreshing volatile context before it corrupts the answer.

Better final quality

Keep stale memories, revoked permissions, and superseded source versions out of reused prefixes.

Clear ownership

LMCache owns runtime blocks; TemporalStore owns context validity; MatrixDB can serve hot metadata; MatrixKV can protect truth.

Auditable reuse

Replay why a prefix was reused, why a section was rebuilt, and which event invalidated the previous cache plan.

The product message

TemporalStore makes runtime reuse safer because it adds application meaning to cache decisions. LMCache can answer "can this prefix/KV state be reused technically?" TemporalStore helps MatrixArk answer "should this context be reused for this user, at this time, with these sources, permissions, and memories?"

Short version: remote KV-cache saves compute when prefixes repeat; TemporalStore saves correctness when context changes.