TemporalStore: LLM Context Management and Prompt Serving

The short version

TemporalStore is MatrixArk's standalone time-aware context engine. Its target is the gap between vector databases, logs, prompt templates, caches, wide-column databases, and online caches: teams need one online system for context state that changes fast and must be retrieved at prompt time.

LLM agent systems repeatedly ask questions such as: what did the user say recently, what tools already failed, which promise is still open, which memory was superseded, and what context should be placed into the next model call? Those questions need a serving layer that understands time, freshness, replay, and policy-aware context assembly.

The product thesis: serve LLM context timelines, prompt replay, memory deltas, counters, and long sequences directly, with durable state that remains replayable after the model response.

Why cache plus logs breaks down

A common agent architecture starts cleanly: prompts live in one place, vector search retrieves chunks, Redis stores summaries, logs hold tool traces, and application code performs permissions, freshness checks, and final context stuffing. That becomes fragile when every response needs session state, tool-call history, policy counters, source validity, user preferences, and committed actions in one context bundle.

Scattered context stack

Prompt templates Vector search Redis summaries Tool logs Permission checks Custom stuffing logic Model response

TemporalStore path

Context events and memory deltas TemporalStore timelines Fresh context pack query Prompt, replay, and audit

What TemporalStore owns for context

The central design choice is to make the storage engine understand the online model. Instead of storing an opaque blob and forcing every caller to implement window logic, the engine exposes model-aware commands over entity-local state.

Session timelines

Store user turns, tool calls, retrieval events, source changes, and model-visible context over time.

Prompt replay

Reconstruct what the agent knew, retrieved, filtered, and committed before each model call.

Freshness counters

Track stale memory, repeated failures, open commitments, policy windows, and request-time validity signals.

Long context sequences

Serve recent ordered behavior, conversation, tool, and workflow events with limits, filters, and timestamps.

Memory governance

Keep superseded, unauthorized, low-confidence, or conflicting memories from entering the prompt.

Context pack state

Store memory metadata, retrieval feedback, source freshness, and state used to assemble LLM prompts.

Implementation direction: C++ core and Rust open source in July 2026

TemporalStore's performance-critical serving and storage core is designed with C++ in mind: low-latency data structures, memory layout control, cache behavior, and tight storage-engine integration matter on the hot path. The open-source TemporalStore track is Rust, with the Rust version planned to open source in July 2026. That gives the community-facing implementation safer concurrency, maintainable systems code, and a modern API surface for typed temporal updates.

Layer	Implementation direction	Why
Performance-critical serving core	C++	Precise control over latency, memory layout, cache locality, and high-QPS execution.
Open-source TemporalStore	Rust	Planned for July 2026, with safe concurrency, maintainable systems code, and a clear developer-facing implementation.
Public model	Shared concepts	Keep namespaces, tables, keys, typed updates, windows, retained records, and context reads consistent.

Before and after: context timelines and prompt replay

The biggest change is where context logic lives. Timelines, counters, memory deltas, and replay no longer need separate logs, summary jobs, cache keys, and service-side filters for every new prompt question. TemporalStore turns that spread into one online system for context serving.

Prompt context aggregation

Before: many services Tool events land in logs Summaries update elsewhere Vector retrieval runs alone Permissions checked in app code Cache key per context type Manual prompt stuffing Replay requires log joins Memory repair flow

After: one system Typed context update Request-time freshness filters Replayable context pack

Good for freshness counters, open commitments, stale-memory checks, and context-pack metadata.

Long context sequences

Before: many services Conversation log Tool trace store Retrieval feedback table Memory summary job Online cache blob Policy service filters it Prompt builder trims it Replay job for audits

After: one system Append typed context row Filter by time, source, and policy Serve prompt-ready timeline

Good for session timelines, tool-call history, retrieval feedback, memory deltas, and user behavior context.

SDK-facing architecture

TemporalStore is organized around a small serving API vocabulary: a Client or ProxyClient, namespace_name, table_name, key, typed values, sequence rows, filters, and bounded time windows. The metaserver manages routing and table policy, while data nodes run the serving workers that apply model-aware updates and answer reads.

Applications
agents, copilots, AI workspaces SDK or proxy
Client, ProxyClient, Options Metaserver
namespaces, tables, routing

Serving workers
typed records and models Readable replicas
freshness-gated read fanout Shared store
durable update streams and recovery

put_string
latest value state append_context_rows
typed timeline appends query_context_rows
filtered time-window reads

The public model is intentionally simple. Applications address state by namespace, table, and key. TemporalStore handles typed updates, read policy, replica freshness, and shared-store recovery behind that API boundary.

Write workflow

A write enters through the SDK or proxy with a namespace, table, key, and model-specific command. For latest values this can be put_string; for long context history it can be append_context_rows with typed rows such as timestamp, source, action type, tool name, confidence, and visibility policy.

1. Client sends namespace, table, key, and command 2. Routing policy selects the serving owner 3. Serving worker validates model limits 4. Typed record or sequence state is updated 5. Durable update stream records replayable state 6. Replicas become readable according to freshness policy

This is the important product point: applications do not need a separate streaming job for every freshness counter, filter, sequence, or context window. The serving engine keeps the typed state close to the request path and persists it through the configured storage mode.

append_context_rows(
  namespace_name = "agent_workspace",
  table_name = "session_timeline",
  key = user_id,
  rows = [{ timestamp, source, tool_name, confidence, visibility }]
)

query_context_rows(
  key = user_id,
  start_ts = now - 30 minutes,
  end_ts = now,
  count = 100,
  filters = [{ field: "source", op: Equal, value: "tool_call" }]
)

Read workflow

Reads use the same logical vocabulary. get_string returns the latest value for a namespace, table, and key. query_context_rows returns a bounded window using start_ts, end_ts, count, and filter rules such as equal, not equal, greater than, and less than.

Query
key, window, count, filters Serving worker
typed model execution Readable replica
when freshness allows

Hot state
low-latency online reads Shared-store recovery
durable retained state Result
latest value or typed rows

The query is not a full-table scan. The request targets an entity key and asks for a bounded model operation over retained state. That shape fits high-cardinality online context state, where many users, sessions, documents, and workflows can update independently while each request asks for one entity or a small set of entities.

SDK vocabulary

The portal should describe TemporalStore using the same words application developers see in the SDK and proxy API.

Term	Meaning	Why it matters
`namespace_name`	The top-level tenant or product boundary.	Teams can isolate context domains and operating policy.
`table_name`	The logical table for a data model.	Tables can choose the right storage mode, limits, and read policy.
`key`	The entity key, such as a user, device, session, item, or campaign.	High-cardinality state stays entity-local and request-time.
`ContextRow`	A typed timeline row with timestamp, source, policy, and context fields.	Agents can read recent behavior, tool events, retrieval feedback, and memory deltas directly.
`ContextFilter`	A field, operator, and value applied during context queries.	Applications can ask fresh context questions without precomputing every view.
`start_ts`, `end_ts`, `count`	The bounded query window and result limit.	Reads stay predictable even for long histories.
`pin_primary`	A read policy option for read-your-write behavior.	Correctness-sensitive paths can prefer the primary while read-heavy paths use replicas when fresh.

Read policy and recovery

TemporalStore supports primary-pinned reads when the application needs read-your-write behavior, and replica reads when freshness policy allows read fanout. Shared-store modes keep durable update streams and retained state available for recovery without exposing internal implementation details to clients.

Operational guardrail: expose replica freshness, table readiness, recovery progress, and SDK latency as product signals. Applications should know whether a read is primary-pinned, replica-served, or delayed by freshness policy.

Client writes typed update Serving owner applies model logic Shared store persists replayable state Replica catches up through recovery stream Fresh replica becomes readable

Data model examples

The right model depends on the product question.

Use case	Entity key	Model	Example query
Purchase velocity	`user_id`	TemporalCounter	Count purchases in the last 5 minutes.
Failed login risk	`device_id`	TemporalAggregate with dimensions	Failed logins by country and method in the last 30 minutes.
Card testing	`card_id`	TemporalDistinct	Unique merchants touched in the last 24 hours.
Chargeback monitoring	`merchant_id`	TemporalAggregate	Chargebacks by channel in the last 7 days.
Frequency cap	`campaign_id + user_id`	Composite-key counter	Impressions in the last hour, day, or campaign window.
Ranking sequence	`user_id`	Sequence	Recent clicked items filtered by category and recency.
Agent context	`session_id`	Sequence plus counters	Recent tool calls, retrieval feedback, safety counters, and user preference deltas.

Why not just Redis or RocksDB?

Redis-style systems are excellent for simple strings, hashes, latest profiles, leader boards, queues, and many cache workflows. MatrixDB is the MatrixArk product direction for serverless, high-QPS, eventually consistent KV workloads that need Redis-compatible migration, very large-scale storage, low-latency serving, and offline or nearline query access. TemporalStore is different: it tries to put temporal semantics inside the serving engine.

RocksDB is a powerful embedded LSM engine and often the right local persistence layer. The tradeoff is that hot update-heavy temporal data still travels through a generic KV path. For counters, windows, and long sequences that change many times per entity, write amplification can be much larger because it appears in multiple parts of the system: service-side blob rewrites, cache updates, LSM compaction, replay or repair logs, and downstream materialized context tables. TemporalStore's architecture introduces a purpose-built temporal storage layer for hot typed state, durable update streams, retained records, multi-layer cache, shared-store recovery, and replica-readable serving.

Generic RocksDB-backed path

Context code in services Encode generic KV Persist through LSM engine Add cache, replay, repair, and jobs around it

TemporalStore path

Typed context command Model-aware online state Purpose-built temporal storage Shared-store recovery and replica reads

System	Good at	Where TemporalStore differs
Redis-compatible cache	Fast general data structures and latest-value serving.	TemporalStore adds context windows, filters, long sequences, replayable memory state, shared-store recovery, and multi-layer cache behavior.
RocksDB-backed KV	Durable ordered local storage with mature LSM behavior, but hot temporal updates can amplify writes across blob rewrites, cache mutation, LSM compaction, replay, and materialization.	TemporalStore adds purpose-built temporal storage for typed timelines, counters, sequences, retained records, update streams, cache, recovery, and replica reads.
Feature store	Registry, training sets, materialization, lineage, offline/online consistency.	TemporalStore is primarily the prompt-time context engine; feature serving is a secondary workload on the same temporal model.
Stream processor	Known transformations, joins, durable event-time processing.	TemporalStore serves request-time entity windows when precomputing every window is too rigid.
Time-series database	Metric series, analytics queries, monitoring workloads.	TemporalStore is entity-serving-first, not dashboard-query-first.

Where feature platforms fit as a secondary workload

TemporalStore should lead with LLM context management: timelines, tool history, memory deltas, prompt replay, freshness counters, and request-time context packs. Feature platforms such as Feast, Chronon, Fennel, and Featureform remain useful for registry, training sets, lineage, and offline/online consistency. TemporalStore can sit beside them when a secondary feature-serving workload needs fresh temporal windows, counters, and sequences without a custom pipeline for every new online question. MatrixDB handles large-scale KV serving and exports, while MatrixKV handles transactional KV state without forcing one product to own every layer.

Prompt and feature registry
definitions, owners, lineage Durable context store
retention, replay, audit Stream or batch compute
durable transforms

TemporalStore
fresh temporal state MatrixDB
serverless KV database, profiles, scans, exports MatrixKV
transactional KV database, strong consistency, metadata

Prompt builder or rules engine
low-latency context decision Monitoring
lag, errors, cache, storage Export path
training and audit

LLM context is a related, not identical, problem

TemporalStore is not a GPU KV-cache manager. It does not replace the transformer KV cache used by vLLM, SGLang, TensorRT-LLM, or LMCache-style systems. The LLM runtime still needs tensor layout, prefix matching, GPU memory management, token-position lifecycle, and attention-cache APIs.

The integration point is remote cache orchestration. LMCache or a similar remote cache can reuse model prefixes and attention KV state, MatrixDB can expose Redis-compatible cache metadata and eligibility keys, while TemporalStore serves the fresh temporal context, policy counters, session timelines, and retrieval feedback that decide what should be sent to the runtime in the first place.

The overlap is structured context and state. Agent systems need recent conversation events, tool calls, retrieved-document metadata, memory freshness, safety counters, user preferences, and policy state. Vector databases are good at similarity search. TemporalStore is useful for temporal and structured context that should be filtered, counted, ordered, replayed, ranked, personalized, or expired with serving semantics.

Operating principles

TemporalStore is a serving engine, so product readiness depends on more than the data model. Operators need to understand table readiness, serving ownership, replica freshness, recovery progress, cache behavior, storage errors, and client retries without digging through application code.

The public product boundary is simple: application teams should see a stable context API, predictable latency, clear freshness semantics, and enough visibility to know when a context answer came from hot memory, warm cache, or durable state. Infrastructure teams should see ownership, recovery, and capacity signals that make failover and scaling deliberate rather than surprising.

Table readiness

Track serving owner, replica state, table readiness, and safe promotion status.

Replica freshness

Report catch-up lag, missing reads, retry latency, and time-to-visible on replicas.

Shared store and cache

Track durable update progress, retained state, cache usage, cache hit ratio, and cold reads.

Model correctness

Expose per-model request failures, encoding mismatches, query result shape, and guardrail tests.

Scaling and rollout principles

Scaling TemporalStore should be coordinated by the metadata and placement layer, not by independent data nodes guessing which serving work they own. New capacity should register, receive secondary assignments, catch up from durable state and replay, and then gradually take read traffic or primary ownership.

Capacity registers with metaserver Planner assigns secondary replicas Replica catches up from shared-store recovery state Freshness gates pass Routing table updates Reads or primaries move gradually

Scale-down follows the reverse path: drain traffic, stop new primary assignments, move ownership away, wait for replacement replicas to become healthy, and then remove the node from the serving plan. The important product promise is controlled movement of state, not surprise loss of the online path.

Performance posture

TemporalStore should make performance claims through repeatable benchmarks, not one-off environment snapshots. The important measurements are high-cardinality write throughput, hot-read latency, warm-read cache behavior, cold-read recovery cost, replica catch-up time, and query cost for windows, filters, distinct state, and long sequences.

Signal	Why it matters	Product expectation
High-cardinality writes	Shows whether many sparse entities can update concurrently.	Serving workers absorb event volume without per-context pipelines.
Hot reads	Shows request-path latency for active entities.	Memory-resident state answers latest, window, filter, and sequence queries quickly.
Warm and cold reads	Shows behavior after eviction or less frequent access.	Shared-store recovery and cache refill keep retained context state queryable.
Replay and recovery	Shows whether replicas can become trustworthy after movement or restart.	Replicas catch up through shared-store state and durable update streams after checkpoints.

This keeps the public claim disciplined: TemporalStore is designed for high-performance online serving where latency, memory efficiency, and throughput matter. Benchmark numbers should be published only when they are repeatable, labeled, and tied to a clear workload.

The product boundary

TemporalStore should be honest about what it is. It is not a full feature platform, not a vector database, and not a transformer KV-cache runtime. It is time-aware LLM context memory: session timelines, tool events, memory deltas, prompt replay, freshness counters, long sequences, and replayable online state. MatrixDB handles serverless KV database serving, profiles, and offline-queryable state. MatrixKV handles Canonical truth and workflow KV state.

Use TemporalStore when context depends on recent events, tool timelines, freshness, filters, replay, or long sequences.
Use MatrixDB when the workload is active profile, large hash/profile KV, hot-key cache, Redis-compatible migration, LMCache metadata, tenant-scale service state, high-QPS serving, very large-scale storage, scans, exports, or offline/nearline query over persisted KV data and eventual consistency is acceptable.
Use MatrixKV when the workload needs a Canonical truth and workflow KV, timestamp coordination, metadata correctness, committed writes, or correctness-critical KV state.
Use feature platforms for secondary feature registries, training sets, lineage, and backfills.