TemporalStore

Time-aware, low-latency memory for LLM context engineering.

TemporalStore should cover most LLM context needs by itself: time-aware memory, temporal KV, latest KV, prompt replay, low-latency fetch and processing, multi-layer cache, and persistent storage. It stores what happened, when it happened, what changed, what the agent already tried, the latest values that matter now, and which context should be trusted, ignored, replayed, filtered, reused, or refreshed. LMCache-style systems can handle prefix and model-runtime KV-cache reuse while TemporalStore manages durable application memory and prompt context behind the MatrixArk context surface.

Simple customer schemas, strict serving models

TemporalStore is strongest when it gives vertical AI products a simple customer-facing context model and a strict internal serving model. Customers describe their business world: company, team, project, ticket, matter, claim, incident, approval, cost, policy, action, source object. MatrixArk compiles that into scope hashes, declared collections, timestamp sort keys, secondary indexes, freshness windows, and request-time query budgets.

Customer sees

Readable paths, object types, normal JSON records, common questions, and UI hints from the vertical AI harness.

MatrixArk compiles

Scope hashes, collection keys, equality-prefix indexes, time ranges, retention, limits, and context-pack sections.

TemporalStore serves

Bounded online reads for latest facts, recent sequences, open commitments, stale-memory checks, replay, and prompt-time context.

Other layers help

MatrixDB supports hot state and Redis-compatible metadata; MatrixKV protects committed truth; OLAP/HSAP handles broad scans and ad hoc analysis.

Design principle: customer-defined JSON is allowed at the edge, but request-time serving must be compiled into declared indexed access. TemporalStore should not become an unbounded JSON filter engine in the prompt path.

Where MatrixArk routes temporal context

MatrixArk routes time-aware LLM context to TemporalStore: session timelines, tool-call history, prompt replay, memory deltas, open commitments, stale-memory detection, freshness counters, temporal windows, latest per-entity KV, and long behavior sequences. That is the opportunity: most LLM stacks still treat time as logs or TTLs, not as a request-time context primitive.

Portfolio routing: if the value is needed for serving, latest context, temporal KV, time ordering, filtering, replay, and prompt-time context assembly, MatrixArk uses TemporalStore first. MatrixDB is added only for database-specific needs such as Redis-compatible online/offline/nearline KV, multi-tenancy, very large profile or summary KV, scans, exports, and tens of millions of QPS; MatrixKV is added only for low-volume permissions, ownership, approvals, leases, and workflow truth that need transactional correctness or strong consistency.
Read the MatrixArk routing guide

Product Advantages

Why TemporalStore for LLM context?

01

Context is temporal

Prompts depend on recent actions, open commitments, superseded memories, tool failures, and state transitions. TemporalStore keeps that timeline queryable so the prompt can say what to use, avoid, refresh, or replay.

02

Replayable agents

Store tool events, retrieved context, memory diffs, prompt packs, and committed actions so teams can debug and evaluate agent behavior later.

03

Extremely performant serving

Serving workers, typed models, durable update streams, and shared-store recovery target low-latency ingestion and request-time context queries.

04

Freshness and counters

Windowed counters, recency features, rejection signals, and safety limits help decide which memories should enter the prompt now.

05

Token-efficient answers

Serve compact context packs with latest valid facts, recent deltas, blocked stale memories, and citations so models spend fewer tokens on noise.

06

Portfolio routing

MatrixArk can route time-aware memory to TemporalStore, hot and nearline KV to MatrixDB, and canonical facts to MatrixKV behind one product surface.

07

LMCache companion

TemporalStore works beside LMCache-style runtime reuse by separating stable prompt sections from volatile timeline, permission, source-version, and memory sections that must refresh.

Why TemporalStore Is Different

It turns temporal memory into a low-latency serving primitive.

That is why TemporalStore is game-changing: the same system can record what happened, serve fresh context in the request path, replay what the model saw, and decide what memory should be trusted, ignored, reused, or refreshed. Most teams can build a demo with a vector database, prompt template, Redis cache, and logs. Customers feel the pain when the agent must answer with the right memory, at the right time, under the right permission, and then explain exactly why that context entered the prompt.

Better than raw logs

Logs are for inspection after failure. TemporalStore serves ordered timelines, windows, counters, and replay records during the request.

Better than cache-only memory

Cache keys are fast but fragile. TemporalStore gives durable, typed, queryable memory that can recover, replay, enforce freshness rules, and explain why context was used.

Better than vector-only RAG

Semantic similarity is only one signal. TemporalStore adds recency, sequence, open commitments, repeated failures, source freshness, time-valid behavior, and cache eligibility.

Better customer fit

Vertical AI builders can start with the Rust TemporalStore open-source release planned for July 2026, while MatrixArk keeps the broader state-engine choices behind the platform surface.

KV-cache control plane

Feed LMCache-style systems stable prompt sections, reusable source packs, Redis-compatible metadata keys, invalidation signals, and freshness decisions while keeping volatile memories outside the runtime cache.

From sequence features to custom temporal context models

The direction for TemporalStore is to extend sequence feature serving into customizable temporal context serving. Vertical AI products should be able to define their own hierarchy, typed records, indexes, filters, freshness rules, and serving guardrails, then query fresh context online without forcing normal LLM requests through offline aggregation.

Customer logical model

Company, team, project, matter, ticket, claim, or incident layers with collections such as approvals, costs, policies, tool history, and memory deltas.

Compiled serving model

Deep customer hierarchy is compiled into scope hashes, declared indexes, sort keys, and time shards so serving avoids expensive tree walking.

Bounded online queries

Only declared indexed filters, scoped time windows, limits, query budgets, and collection caps run in the request path.

Offline only when needed

Large scans, many filters, fuzzy matches, joins, group-bys, and multi-year analytics become summaries written back into TemporalStore.

Example: a finance assistant can model Company A, Platform Infra, Project 1, approvals, and costs. TemporalStore can answer "Can we buy another GPU batch?" by serving the latest active approval, current committed spend, remaining budget, stale approvals, and missing approval warnings directly from bounded online context queries.

Context-first architecture

Context API / SDK Router Serving worker Temporal model Durable event stream Replayable store

TemporalStore keeps namespace, table, key, and model-aware temporal state close to the prompt-serving path. SDK writes append context rows, tool events, memory deltas, counters, and long sequences, while queries read bounded windows with filters for prompt-time assembly.

Serving Workflow

  1. Ingest a session event, tool call, memory diff, counter update, behavior append, or context record.
  2. Resolve namespace, table, key, and model.
  3. Apply updates through the typed model instead of service-side glue.
  4. Answer timeline, sequence, replay, window, filter, risk-signal, and context queries online.

Scenario Architecture

One high-QPS serving path for prompt context, memory updates, and replay.

Ingest

Application events, stream consumers, SDK writes, and repair jobs write the same entity timeline or aggregate object at production serving scale.

Store

Serving workers update typed records, append durable update streams, and keep shared-store state ready for replay.

Serve

LLM systems request timelines, replay records, freshness counters, long behavior sequences, ad hoc filters, and prompt-ready context through one low-latency serving API.

Before and Now

TemporalStore replaces scattered logs, caches, and ad hoc prompt assembly with one context system.

Long context sequences

Before: many services Raw events Stream fanout Offline trim job Nearline transform job Recent-list cache Filter service Ranker join logic Recovery replay job
After: one system Append typed context row Query by key, window, count, filter Serve prompt-ready timeline

Agents can read recent behavior, tool history, and memory deltas without rebuilding a new context pipeline for every prompt question.

Context freshness aggregates

Before: many services Event stream Offline daily job Offline hourly job Stream job per metric Window materializer Cache key per window Prompt join service Backfill pipeline
After: one system Update entity aggregate state Query fresh windows at request time Persist and recover one serving model

Agents and policy systems get fresh high-cardinality context signals without locking every prompt question into a precomputed view.

Frequency caps

Before: many services Ad hoc counter service Redis TTL keys Policy service Quota service Offline daily and hourly jobs Reconciliation job Cache repair script More online oncall surfaces
After: one system One key for user, campaign, tenant Read caps by window and policy Persist state with multi-layer cache

Frequency cap logic becomes a shared online data model instead of scattered TTL math across services and jobs.

Architecture Innovation

A purpose-built temporal storage layer, not just RocksDB behind an API.

RocksDB is excellent embedded storage, but TemporalStore is designed around online temporal data models: hot typed state, durable update streams, retained records, multi-layer cache, replica reads, and compute/storage disaggregation. For hot update-heavy temporal data, generic RocksDB-backed serving can create much larger write amplification across encoded blob rewrites, cache mutation, LSM compaction, replay logs, repair jobs, and downstream materializations; TemporalStore's storage path is built to keep typed deltas, retained records, cache refill, and recovery in one model-aware flow.

Generic RocksDB-backed serving

Application logic
feature semantics in services
Encode latest value or blob
model state outside storage
Write generic KV engine
LSM write path and compaction amplification
Cache, replay, repair jobs
extra write paths to operate

TemporalStore storage architecture

Typed SDK command
namespace, table, key, model
Hot temporal state
windows, counters, sequences, filters
Purpose-built durable state
update streams and retained records
Shared store plus replicas
cache, recovery, read fanout

Implementation Direction

C++ core performance, Rust TemporalStore open source in July 2026.

The performance-critical serving and storage implementation is designed around C++ for low-latency data structures, cache control, memory management, and high-QPS execution. The open-source TemporalStore direction is Rust, with the Rust version planned to open source in July 2026, so the community-facing implementation can emphasize safety, maintainability, and a modern systems-programming developer experience.

C++ serving core

Use C++ where hot-path latency, memory layout, cache locality, and storage-engine integration matter most.

Rust open source

Open source the Rust TemporalStore track in July 2026, with safer concurrency and a clean systems API surface.

Shared model

Keep the public concepts consistent: namespaces, tables, typed updates, retained records, windows, and context reads.

LLM Context

Online memory and context features for LLM applications.

TemporalStore is not a transformer KV-cache runtime. It is the persistent online state layer around the model: the place to keep structured context, temporal memory, counters, retrieval metadata, and context signals that decide what should enter the next prompt, tool call, ranking step, or safety policy. It can integrate beside LMCache-style systems and remote cache layers that reuse model prefixes or attention KV state.

Session memory

Persist recent conversation turns, user actions, tool calls, and state transitions with time-aware retention and replay.

Context ranking signals

Serve freshness, frequency, recency, and interaction signals that help choose which memories or documents should enter the model context.

LMCache integration

Work beside LMCache or remote cache services for prefix reuse, model-runtime KV-cache reuse, repeated prompt segments, cache eligibility, and invalidation hints.

Tool-use timelines

Keep ordered tool results, errors, retries, and agent decisions as long context sequences for debugging and next-step planning.

Safety and policy counters

Track rate limits, abuse signals, topic frequency, user risk, and policy state with online windows and distinct counts.

Retrieval metadata

Store document impressions, clicks, feedback, source freshness, and retrieval history beside vector search instead of inside the vector index.

Personalization state

Serve user preference deltas, recent task history, account context, and behavior sequences for personalized agents and copilots.

Cloud-Native Operations

Public-cloud rollout across AWS, GCP, and Azure.

MatrixArk can provide TemporalStore as a managed public-cloud service on AWS, GCP, or Azure, with private deployment available for customer-controlled environments.

Authenticated console

Cluster health, deployment status, and diagnostics live behind customer access control.

Public-cloud delivery

Managed service delivery on AWS, GCP, and Azure, plus private cloud or on-prem when needed.

Private metrics

Metrics and logs stay on private networking or controlled observability integrations.

Data Models

Built for LLM context engineering first, with typed temporal models underneath.

TemporalCounter

Counts and sums over keyed time buckets for velocity, caps, and online policies.

TemporalAggregate

High-cardinality filtered sum, min, max, count, and model-specific rollups over recent event state and bucketed dimensions.

TemporalDistinct

Unique merchant, device, campaign, IP, or session counts inside time windows.

Sequence

Long user, item, session, and agent action sequences with filters, timestamps, and high-performance online reads.

Hash/Profile

Latest user, account, tenant, document, or session attributes beside temporal context.

Context

LLM and agent context, tool timelines, session memory, retrieval metadata, preference deltas, and safety/rate counters.

Comparison

AlternativeGood atTemporalStore difference
Redis / Redis EnterpriseFast cache, strings, hashes, modules, and ephemeral serving patternsTemporalStore adds typed timelines, filters, temporal windows, replayable prompt context, and durable shared-store recovery; MatrixDB keeps the Redis-compatible hot-state bridge.
Vector databaseSemantic retrieval over chunks and embeddingsTemporalStore decides which memories, events, freshness signals, and timelines should enter the prompt now.
Logs plus cacheTrace capture and fast temporary stateReplayable context state, memory deltas, freshness counters, and prompt-ready timelines in one serving path.
Feature storeFeature definitions, lineage, training sets, and online lookupsTemporalStore focuses on LLM context timelines and prompt-time temporal state, with feature-serving patterns available as a secondary workload.
Prompt managementPrompt templates, versions, evals, and test casesTemporalStore provides the live context substrate: memories, tool history, freshness, permissions, and replayable prompt inputs.
LLM observabilityTrace collection, cost tracking, latency, and debugging viewsTemporalStore governs context before the model call and stores enough state to replay why that context was selected.

Where TemporalStore fits

TemporalStore is the primary serving engine for time-aware LLM context engineering: timelines, memory deltas, tool events, temporal KV, latest KV, prompt replay, freshness counters, long sequences, persistence, and multi-layer caching. MatrixDB supplies the database layer only when teams need Redis-compatible online/offline/nearline KV, large profile or summary records, scans, exports, cheaper persisted storage, multi-tenancy, tens of millions of QPS, and familiar application APIs. MatrixKV supplies low-volume transactional KV when a permission, version, lease, approval, ownership record, or committed action must be correct.

Talk to MatrixArk