TemporalStore Deep Dive
Time-aware LLM context management for prompts, memory, and replay.
Modern LLM products need reliable prompt context: timelines, tool events, memory deltas, prompt replay, freshness counters, long sequences, and permissions-aware context assembly inside the request path.
The short version
TemporalStore is MatrixArk's standalone time-aware context engine. Its target is the gap between vector databases, logs, prompt templates, caches, wide-column databases, and online caches: teams need one online system for context state that changes fast and must be retrieved at prompt time.
LLM agent systems repeatedly ask questions such as: what did the user say recently, what tools already failed, which promise is still open, which memory was superseded, and what context should be placed into the next model call? Those questions need a serving layer that understands time, freshness, replay, and policy-aware context assembly.
Why cache plus logs breaks down
A common agent architecture starts cleanly: prompts live in one place, vector search retrieves chunks, Redis stores summaries, logs hold tool traces, and application code performs permissions, freshness checks, and final context stuffing. That becomes fragile when every response needs session state, tool-call history, policy counters, source validity, user preferences, and committed actions in one context bundle.
Scattered context stack
TemporalStore path
What TemporalStore owns for context
The central design choice is to make the storage engine understand the online model. Instead of storing an opaque blob and forcing every caller to implement window logic, the engine exposes model-aware commands over entity-local state.
Session timelines
Store user turns, tool calls, retrieval events, source changes, and model-visible context over time.
Prompt replay
Reconstruct what the agent knew, retrieved, filtered, and committed before each model call.
Freshness counters
Track stale memory, repeated failures, open commitments, policy windows, and request-time validity signals.
Long context sequences
Serve recent ordered behavior, conversation, tool, and workflow events with limits, filters, and timestamps.
Memory governance
Keep superseded, unauthorized, low-confidence, or conflicting memories from entering the prompt.
Context pack state
Store memory metadata, retrieval feedback, source freshness, and state used to assemble LLM prompts.
Implementation direction: C++ core and Rust open source in July 2026
TemporalStore's performance-critical serving and storage core is designed with C++ in mind: low-latency data structures, memory layout control, cache behavior, and tight storage-engine integration matter on the hot path. The open-source TemporalStore track is Rust, with the Rust version planned to open source in July 2026. That gives the community-facing implementation safer concurrency, maintainable systems code, and a modern API surface for typed temporal updates.
| Layer | Implementation direction | Why |
|---|---|---|
| Performance-critical serving core | C++ | Precise control over latency, memory layout, cache locality, and high-QPS execution. |
| Open-source TemporalStore | Rust | Planned for July 2026, with safe concurrency, maintainable systems code, and a clear developer-facing implementation. |
| Public model | Shared concepts | Keep namespaces, tables, keys, typed updates, windows, retained records, and context reads consistent. |
Before and after: context timelines and prompt replay
The biggest change is where context logic lives. Timelines, counters, memory deltas, and replay no longer need separate logs, summary jobs, cache keys, and service-side filters for every new prompt question. TemporalStore turns that spread into one online system for context serving.
Prompt context aggregation
Good for freshness counters, open commitments, stale-memory checks, and context-pack metadata.
Long context sequences
Good for session timelines, tool-call history, retrieval feedback, memory deltas, and user behavior context.
SDK-facing architecture
TemporalStore is organized around a small serving API vocabulary: a
Client or ProxyClient, namespace_name,
table_name, key, typed values, sequence rows, filters,
and bounded time windows. The metaserver manages routing and table policy, while
data nodes run the serving workers that apply model-aware updates and answer reads.
agents, copilots, AI workspaces SDK or proxy
Client, ProxyClient, Options Metaserver
namespaces, tables, routing
typed records and models Readable replicas
freshness-gated read fanout Shared store
durable update streams and recovery
put_stringlatest value state
append_context_rowstyped timeline appends
query_context_rowsfiltered time-window reads
The public model is intentionally simple. Applications address state by namespace, table, and key. TemporalStore handles typed updates, read policy, replica freshness, and shared-store recovery behind that API boundary.
Write workflow
A write enters through the SDK or proxy with a namespace, table, key, and model-specific
command. For latest values this can be put_string; for long context history
it can be append_context_rows with typed rows such as timestamp,
source, action type, tool name, confidence, and visibility policy.
This is the important product point: applications do not need a separate streaming job for every freshness counter, filter, sequence, or context window. The serving engine keeps the typed state close to the request path and persists it through the configured storage mode.
append_context_rows(
namespace_name = "agent_workspace",
table_name = "session_timeline",
key = user_id,
rows = [{ timestamp, source, tool_name, confidence, visibility }]
)
query_context_rows(
key = user_id,
start_ts = now - 30 minutes,
end_ts = now,
count = 100,
filters = [{ field: "source", op: Equal, value: "tool_call" }]
)
Read workflow
Reads use the same logical vocabulary. get_string returns the latest value
for a namespace, table, and key. query_context_rows returns a bounded
window using start_ts, end_ts, count, and
filter rules such as equal, not equal, greater than, and less than.
key, window, count, filters Serving worker
typed model execution Readable replica
when freshness allows
low-latency online reads Shared-store recovery
durable retained state Result
latest value or typed rows
The query is not a full-table scan. The request targets an entity key and asks for a bounded model operation over retained state. That shape fits high-cardinality online context state, where many users, sessions, documents, and workflows can update independently while each request asks for one entity or a small set of entities.
SDK vocabulary
The portal should describe TemporalStore using the same words application developers see in the SDK and proxy API.
| Term | Meaning | Why it matters |
|---|---|---|
namespace_name | The top-level tenant or product boundary. | Teams can isolate context domains and operating policy. |
table_name | The logical table for a data model. | Tables can choose the right storage mode, limits, and read policy. |
key | The entity key, such as a user, device, session, item, or campaign. | High-cardinality state stays entity-local and request-time. |
ContextRow | A typed timeline row with timestamp, source, policy, and context fields. | Agents can read recent behavior, tool events, retrieval feedback, and memory deltas directly. |
ContextFilter | A field, operator, and value applied during context queries. | Applications can ask fresh context questions without precomputing every view. |
start_ts, end_ts, count | The bounded query window and result limit. | Reads stay predictable even for long histories. |
pin_primary | A read policy option for read-your-write behavior. | Correctness-sensitive paths can prefer the primary while read-heavy paths use replicas when fresh. |
Read policy and recovery
TemporalStore supports primary-pinned reads when the application needs read-your-write behavior, and replica reads when freshness policy allows read fanout. Shared-store modes keep durable update streams and retained state available for recovery without exposing internal implementation details to clients.
Data model examples
The right model depends on the product question.
| Use case | Entity key | Model | Example query |
|---|---|---|---|
| Purchase velocity | user_id | TemporalCounter | Count purchases in the last 5 minutes. |
| Failed login risk | device_id | TemporalAggregate with dimensions | Failed logins by country and method in the last 30 minutes. |
| Card testing | card_id | TemporalDistinct | Unique merchants touched in the last 24 hours. |
| Chargeback monitoring | merchant_id | TemporalAggregate | Chargebacks by channel in the last 7 days. |
| Frequency cap | campaign_id + user_id | Composite-key counter | Impressions in the last hour, day, or campaign window. |
| Ranking sequence | user_id | Sequence | Recent clicked items filtered by category and recency. |
| Agent context | session_id | Sequence plus counters | Recent tool calls, retrieval feedback, safety counters, and user preference deltas. |
Why not just Redis or RocksDB?
Redis-style systems are excellent for simple strings, hashes, latest profiles, leader boards, queues, and many cache workflows. MatrixDB is the MatrixArk product direction for serverless, high-QPS, eventually consistent KV workloads that need Redis-compatible migration, very large-scale storage, low-latency serving, and offline or nearline query access. TemporalStore is different: it tries to put temporal semantics inside the serving engine.
RocksDB is a powerful embedded LSM engine and often the right local persistence layer. The tradeoff is that hot update-heavy temporal data still travels through a generic KV path. For counters, windows, and long sequences that change many times per entity, write amplification can be much larger because it appears in multiple parts of the system: service-side blob rewrites, cache updates, LSM compaction, replay or repair logs, and downstream materialized context tables. TemporalStore's architecture introduces a purpose-built temporal storage layer for hot typed state, durable update streams, retained records, multi-layer cache, shared-store recovery, and replica-readable serving.
Generic RocksDB-backed path
TemporalStore path
| System | Good at | Where TemporalStore differs |
|---|---|---|
| Redis-compatible cache | Fast general data structures and latest-value serving. | TemporalStore adds context windows, filters, long sequences, replayable memory state, shared-store recovery, and multi-layer cache behavior. |
| RocksDB-backed KV | Durable ordered local storage with mature LSM behavior, but hot temporal updates can amplify writes across blob rewrites, cache mutation, LSM compaction, replay, and materialization. | TemporalStore adds purpose-built temporal storage for typed timelines, counters, sequences, retained records, update streams, cache, recovery, and replica reads. |
| Feature store | Registry, training sets, materialization, lineage, offline/online consistency. | TemporalStore is primarily the prompt-time context engine; feature serving is a secondary workload on the same temporal model. |
| Stream processor | Known transformations, joins, durable event-time processing. | TemporalStore serves request-time entity windows when precomputing every window is too rigid. |
| Time-series database | Metric series, analytics queries, monitoring workloads. | TemporalStore is entity-serving-first, not dashboard-query-first. |
Where feature platforms fit as a secondary workload
TemporalStore should lead with LLM context management: timelines, tool history, memory deltas, prompt replay, freshness counters, and request-time context packs. Feature platforms such as Feast, Chronon, Fennel, and Featureform remain useful for registry, training sets, lineage, and offline/online consistency. TemporalStore can sit beside them when a secondary feature-serving workload needs fresh temporal windows, counters, and sequences without a custom pipeline for every new online question. MatrixDB handles large-scale KV serving and exports, while MatrixKV handles transactional KV state without forcing one product to own every layer.
definitions, owners, lineage Durable context store
retention, replay, audit Stream or batch compute
durable transforms
fresh temporal state MatrixDB
serverless KV database, profiles, scans, exports MatrixKV
transactional KV database, strong consistency, metadata
low-latency context decision Monitoring
lag, errors, cache, storage Export path
training and audit
LLM context is a related, not identical, problem
TemporalStore is not a GPU KV-cache manager. It does not replace the transformer KV cache used by vLLM, SGLang, TensorRT-LLM, or LMCache-style systems. The LLM runtime still needs tensor layout, prefix matching, GPU memory management, token-position lifecycle, and attention-cache APIs.
The integration point is remote cache orchestration. LMCache or a similar remote cache can reuse model prefixes and attention KV state, MatrixDB can expose Redis-compatible cache metadata and eligibility keys, while TemporalStore serves the fresh temporal context, policy counters, session timelines, and retrieval feedback that decide what should be sent to the runtime in the first place.
The overlap is structured context and state. Agent systems need recent conversation events, tool calls, retrieved-document metadata, memory freshness, safety counters, user preferences, and policy state. Vector databases are good at similarity search. TemporalStore is useful for temporal and structured context that should be filtered, counted, ordered, replayed, ranked, personalized, or expired with serving semantics.
Operating principles
TemporalStore is a serving engine, so product readiness depends on more than the data model. Operators need to understand table readiness, serving ownership, replica freshness, recovery progress, cache behavior, storage errors, and client retries without digging through application code.
The public product boundary is simple: application teams should see a stable context API, predictable latency, clear freshness semantics, and enough visibility to know when a context answer came from hot memory, warm cache, or durable state. Infrastructure teams should see ownership, recovery, and capacity signals that make failover and scaling deliberate rather than surprising.
Table readiness
Track serving owner, replica state, table readiness, and safe promotion status.
Replica freshness
Report catch-up lag, missing reads, retry latency, and time-to-visible on replicas.
Shared store and cache
Track durable update progress, retained state, cache usage, cache hit ratio, and cold reads.
Model correctness
Expose per-model request failures, encoding mismatches, query result shape, and guardrail tests.
Scaling and rollout principles
Scaling TemporalStore should be coordinated by the metadata and placement layer, not by independent data nodes guessing which serving work they own. New capacity should register, receive secondary assignments, catch up from durable state and replay, and then gradually take read traffic or primary ownership.
Scale-down follows the reverse path: drain traffic, stop new primary assignments, move ownership away, wait for replacement replicas to become healthy, and then remove the node from the serving plan. The important product promise is controlled movement of state, not surprise loss of the online path.
Performance posture
TemporalStore should make performance claims through repeatable benchmarks, not one-off environment snapshots. The important measurements are high-cardinality write throughput, hot-read latency, warm-read cache behavior, cold-read recovery cost, replica catch-up time, and query cost for windows, filters, distinct state, and long sequences.
| Signal | Why it matters | Product expectation |
|---|---|---|
| High-cardinality writes | Shows whether many sparse entities can update concurrently. | Serving workers absorb event volume without per-context pipelines. |
| Hot reads | Shows request-path latency for active entities. | Memory-resident state answers latest, window, filter, and sequence queries quickly. |
| Warm and cold reads | Shows behavior after eviction or less frequent access. | Shared-store recovery and cache refill keep retained context state queryable. |
| Replay and recovery | Shows whether replicas can become trustworthy after movement or restart. | Replicas catch up through shared-store state and durable update streams after checkpoints. |
This keeps the public claim disciplined: TemporalStore is designed for high-performance online serving where latency, memory efficiency, and throughput matter. Benchmark numbers should be published only when they are repeatable, labeled, and tied to a clear workload.
The product boundary
TemporalStore should be honest about what it is. It is not a full feature platform, not a vector database, and not a transformer KV-cache runtime. It is time-aware LLM context memory: session timelines, tool events, memory deltas, prompt replay, freshness counters, long sequences, and replayable online state. MatrixDB handles serverless KV database serving, profiles, and offline-queryable state. MatrixKV handles Canonical truth and workflow KV state.
- Use TemporalStore when context depends on recent events, tool timelines, freshness, filters, replay, or long sequences.
- Use MatrixDB when the workload is active profile, large hash/profile KV, hot-key cache, Redis-compatible migration, LMCache metadata, tenant-scale service state, high-QPS serving, very large-scale storage, scans, exports, or offline/nearline query over persisted KV data and eventual consistency is acceptable.
- Use MatrixKV when the workload needs a Canonical truth and workflow KV, timestamp coordination, metadata correctness, committed writes, or correctness-critical KV state.
- Use feature platforms for secondary feature registries, training sets, lineage, and backfills.