Product Manual

MatrixArk API reference for context serving.

MatrixArk gives AI harnesses one product surface for context ingestion, retrieval, feedback, replay, and operations. The caller can stay simple: send raw messages, resources, tool results, final answers, lightweight scope hints, and token budgets. MatrixArk handles extraction, TemporalStore writes, tree traversal, freshness checks, packing, metrics, and replay.

Public API surface

Customers should not model TemporalStore schemas directly in v1. MatrixArk owns extraction, canonicalization, indexes, summaries, embeddings, timeout handling, and context-pack construction.

APIRequired inputWhat MatrixArk does
ingesttenant_id, text, optional hintsExtracts event fields, creates or reuses context nodes, writes events, indexes, summaries, and embeddings.
batch_ingestList of ingest itemsRuns the same idempotent write path for bulk signals, resources, tool outputs, or migrated history.
stream_ingestStream id, offset, payloadAccepts ordered agent or workflow streams while avoiding replayed offsets.
ingest_resourcetenant_id, raw URI, optional hintsParses Markdown, TXT, or PDF, stores chunks and source refs, and writes L0/L1 summaries plus embeddings.
retrievetenant_id, raw query, max context tokensPlans the query, embeds it, traverses TemporalStore, filters by time and metadata, and returns a token-budgeted ContextPack.
feedbackContext pack id or query id, final answer, accepted/rejected refsStores accepted memory, corrections, rejected refs, and confirmation signals for future retrieval.
audit / replayContext pack id or query idReturns the plan, selected refs, dropped refs, timeout notes, token counts, and decision trace.

Deployment modes: hook or standalone, cloud or on-premise

MatrixArk can run as an invisible hook inside an AI agent loop or as a standalone context service that any application calls directly. Both modes use the same APIs, ContextPack audit format, TemporalStore-backed memory, summaries, embeddings, and retrieval pipeline.

ModeWhat the customer integratesBest fit
Hook modeBefore-LLM query hook, optional tool/resource hooks, after-LLM feedback hook, and confirmation hook.Cursor-like products, enterprise assistants, IDE agents, workflow copilots, and vertical AI harnesses.
Standalone modeDirect calls to ingest, batch ingest, stream ingest, resource ingest, retrieve, feedback, audit, and replay.Enterprises that want one central context infrastructure layer across many agents and apps.

Hook + cloud

Fastest integration for AI harness vendors. Hooks call a managed MatrixArk endpoint while the agent keeps its own local context.

Hook + on-premise

Best for sensitive enterprise agents. Hooks call MatrixArk inside the customer VPC/VNET or data center.

Standalone + cloud

One managed context API for many apps when cloud data residency and governance are acceptable.

Standalone + on-premise

Customer-controlled context, audit, model-provider config, and TemporalStore durability for strict governance.

The contract should not change across deployment shapes. Only auth, network boundary, model provider, durability mode, observability sink, and data-residency policy change.

Minimal request envelopes

MatrixArk should always run its own extraction unless a trusted AI harness also sends a first-pass query plan or extracted event. Session ids are strongly recommended, but user-level scope can be used when session id is unavailable.

{
  "tenant_id": "company_a",
  "messages": [
    {"role": "user", "content": "Alice approved the GPU request up to $80k."}
  ],
  "scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
  "metadata": {"source": "cursor_hook", "event_time_ms": 1781500000000}
}
{
  "tenant_id": "company_a",
  "query": "Can we buy another GPU batch for Project 1?",
  "scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
  "max_context_tokens": 1800,
  "hints": {"retrieval_timeout_ms": 5000}
}

Prior context and confirmation policy

MatrixArk resolves prior context before every ingest extraction. This lets agent hooks send simple messages while MatrixArk decides whether a short reply is a confirmation, correction, new event, or noise.

1. Audit first

If context_pack_id or query_id is present, MatrixArk loads the prior ContextPackAudit and selected refs.

2. Summary next

If no audit exists, MatrixArk looks for the scoped node/session L0 summary when hints can resolve a node path.

3. Recent window

If no summary exists, MatrixArk fetches up to 8 recent same-session or same-user events, capped at 4 KB.

4. Replayable result

The event stores prior_context_source, prior_ref_count, and prior_refs so replay can explain the extraction decision.

Short text such as yes, correct, approved, or looks good becomes confirmation only when prior context exists. Without prior context, MatrixArk stores it as noise instead of inventing what the user confirmed.

Retrieval timeout budget

MATRIXARK_RETRIEVAL_TIMEOUT_MS is an end-to-end context retrieval budget. It includes MatrixArk query understanding, optional LLM extraction, query embedding, TemporalStore tree traversal, event/resource filtering, temporal compression lookup, and context-pack construction.

Default

MATRIXARK_RETRIEVAL_TIMEOUT_MS=5000. This is intentionally more generous because OSS model planning and embeddings can be slower than TemporalStore reads.

Traversal sub-budget

TemporalStore traversal should normally target hundreds of milliseconds. It receives only the remaining retrieval budget.

Per-request override

Use hints["retrieval_timeout_ms"] or hints["context_retrieval_timeout_ms"] for heavy workflows or very strict latency paths.

Provider timeouts

Production model providers should still enforce their own network and inference timeouts. The Python MVP checks the budget between pipeline stages.

Fallback rule: if the deadline is reached, MatrixArk returns a normal replayable ContextPack using only context already fetched. It never fabricates context to fill the prompt.

Fallback content order

When retrieval times out, the response remains safe and auditable. The fallback pack uses partial content in this order, only if that content was already fetched before the cutoff.

PriorityReturned contentReason
1Current ContextEvent recordsFresh timestamped facts are usually the most useful prompt evidence.
2L1 summariesCompact overviews help the agent continue with lower token cost.
3Resource chunksExact source details are included only when already selected and token budget allows.
4Compressed cold windowsOlder history remains available as summary evidence without stuffing raw history.
5Empty packIf nothing safe was fetched, return no context instead of stale or unverified content.

Metrics and replay

MatrixArk emits dependency-free Prometheus text through service.prometheus_metrics(). The key histogram is matrixark_pipeline_stage_latency_ms with operation, stage, and status labels.

# Examples
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="query_understanding",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="tree_traversal",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="hard_timeout",status="timeout"} 1

Every returned ContextPack should be replayable. The audit record stores query plan, candidate nodes, selected refs, token count, timeout notes, fallback notes, and the decision trace used to explain why context entered or missed the final prompt.

Storage and durability modes

TemporalStore should default to async context serving for low latency, while supporting explicit sync durability for high-value enterprise facts. Customers can also choose shared-store mode or Raft mode.

ModeDefault useSync meaning
Shared-store modeServerless/cloud and lower operational burdenWait for local durable or shared-store durable commit when requested.
Raft modeOn-prem, compliance, approvals, audit, and stricter replicated durabilityWait for quorum commit when replicated_durable is requested.
{
  "temporalstore": {
    "mode": "shared_store",
    "durability": "async_ack"
  }
}

{
  "temporalstore": {
    "mode": "raft",
    "durability": "replicated_durable",
    "raft": {"peers": ["ts-1:9010", "ts-2:9010", "ts-3:9010"]}
  }
}

Recommended production defaults

Use async ingestion for ordinary chat, tool, summary, and embedding events. Use sync or replicated durability for confirmed memory, approvals, policy facts, audit records, and compliance-sensitive events. Keep model/query-understanding timeout separate from the TemporalStore traversal target, and always inspect replay when debugging context quality.

Talk to MatrixArk