TemporalStore Storage Modes for LLM Context

The short version

TemporalStore is built for context state that changes at very different speeds. Some context is streaming-heavy: tool calls, user turns, retrieval feedback, Redis-compatible hot-state changes, and memory deltas update constantly. Some context is batch-shaped: summaries or profile enrichments change less often. Redis-compatible hot-state paths need predictable persistence and refill. Durable prompt context sits in the middle: session history, open commitments, counters, and replay state need stronger persistence while still benefiting from shared storage and elastic serving compute.

The application API stays stable across those modes: developers still address context with namespace_name, table_name, and key, then use typed commands such as latest-value writes, sequence row appends, filtered time-window reads, and primary-pinned reads when they need read-your-write behavior.

The design point: disaggregate compute and storage, then choose async shared-store mode, Raft replication, or sync shared-store mode based on update frequency, durability expectations, read fanout, and recovery needs.

Before
one serving node owns everything Compute + cache + storage
coupled scaling Recovery competes with traffic
hard failover budget

With TemporalStore modes
compute/storage disaggregated Async, Raft, or sync durability
fit the workload Elastic serving workers
separate QPS from retention

Why separate compute and storage?

In a traditional serving store, the same nodes often own request execution, cache state, replication, and durable storage. That is simple at small scale, but it creates a hard coupling: adding read QPS can force storage reshards, adding ingestion QPS can pressure replicas, and recovery can compete with online traffic.

TemporalStore treats compute nodes as the hot online serving layer and shared storage as the durable substrate for update streams, checkpoints, retained records, and recovery. Compute can be scaled for ingestion and reads. Storage can be scaled for durability, retention, and replay. The same product can then support different persistence modes without rewriting the feature API or changing the application model.

Clients
feature, context, KV, sequence APIs Serving workers
ingest, aggregate, cache, serve Replicas
read fanout, recovery, failover

Async shared store
extreme hot-path QPS Raft replication
primary fault tolerance Sync shared store
durable online updates

Use case one: streaming-heavy keys

Streaming use cases are the workloads where the same logical table receives a continuous flow of small updates: impressions, clicks, risk events, tool events, counters, frequency caps, user behavior sequences, and freshness-sensitive context. The KV or temporal object changes frequently, and applications need both extreme ingestion QPS and low-latency read QPS while the system is still absorbing new events.

For this pattern, TemporalStore can use async storage mode over a shared store. The serving worker accepts the update, applies it to the model-aware online state, serves hot reads, and persists update streams and retained records to shared storage asynchronously. Replicas can support reads, so read QPS scales horizontally instead of being pinned to one primary. Shared storage remains the durable recovery substrate, but the hot write path does not need to wait for every remote storage operation.

Signal	Why async shared-store mode fits	Tradeoff to manage
Very high event ingestion	Compute absorbs writes first and flushes to shared storage asynchronously.	Publish clear lag, checkpoint, and replay metrics.
Hot online reads	All replicas can support reads from memory, cache, and persisted state.	Expose freshness semantics per context table.
Elastic read fanout	Compute replicas scale separately from the shared storage layer.	Keep placement and cache warmup visible.

Use case two: batch-shaped or colder KV

Not every key changes constantly. Some feature families are refreshed by batch jobs, daily imports, periodic model outputs, profile enrichments, or administrative updates. These KVs may be large and important, but they are not always hot-write objects. For this pattern, the priority shifts from maximum ingestion speed to primary fault tolerance, committed ownership, and predictable recovery.

Raft replication is a strong fit here. The primary can replicate committed updates through a Raft group so failover does not depend only on an async flush. The serving compute layer still remains separate from application clients and storage policy, while the replicated log protects the primary-owned state. This is useful when update volume is moderate, the state is important, and teams want a more traditional fault-tolerance boundary for less frequently updated KVs.

Good fit: batch materialized context, slow-moving account or item attributes, profile enrichment, table metadata attached to serving state, and KVs where write latency can afford stronger replication before visibility.

Use case three: durable online context

Many production workloads sit between pure streaming and colder batch. LLM session memory, tool-use timelines, user preference deltas, retrieval feedback, policy counters, recent profile changes, and operational state may update often, but teams still want a stronger persistence point before acknowledging the write. They also want compute/storage disaggregation so read capacity and durable retention can scale independently.

For this pattern, TemporalStore can use storage sync mode over a shared store. The serving worker applies the update and synchronously persists the required record state or update stream to shared storage before acknowledging success. Replicas can refill and serve from the shared storage substrate with clearer durability semantics than async mode, while avoiding the operational shape of a Raft group for every hot object family.

Mode	Best use case	What compute/storage separation buys
Async shared store	Streaming updates, hot counters, long sequences, extreme ingestion QPS	Scale ingest and read replicas independently while shared storage handles replay and recovery.
Raft replication	Batch-shaped KVs, slower updates, primary fault tolerance	Keep a strong replicated primary path for important state without making every workload use it.
Sync shared store	Durable online context, profile deltas, session memory, policy state	Persist before acknowledge while still using shared storage as the disaggregated durability layer.

How to choose

Use async shared-store mode when ingestion QPS and read QPS dominate, keys update frequently, and replicas should all support reads.
Use Raft replication when keys are less frequently updated and the primary needs a clear replicated fault-tolerance path.
Use sync shared-store mode when writes need stronger durable acknowledgement but the workload still benefits from compute/storage disaggregation.
Keep the choice at the table or workload level so one product can serve streaming, batch, and context patterns together.