Chat Memory
A semantic memory store that follows the workspace, not the chat. The model saves facts mid-conversation, recalls them via vector search, and every chat in the workspace inherits the same store. Manual saves from the UI work the same way.
Memories are scoped per-workspace, all chats within the same workspace share the same memory store. When memories are saved, they are embedded as vectors and stored for relevance-based semantic search. This means the system retrieves the most contextually relevant memories rather than relying on keyword matching or recency alone. Memories saved in one chat are automatically available to all other chats in the same workspace.
Memories can optionally be scoped to a workspace, which allows the
x_workspace_search tool to surface them alongside document chunks
and artifacts in a unified search. See
Document Workspace
for details on workspace-level memory management.
Key characteristics:
- Per-workspace scoping for memory storage and retrieval.
- Semantic vector search backed by pgvector. The embedding column is model-agnostic: its dimension is resolved at runtime from the configured embeddings endpoint (commonly 768, 1024, 3072, or 4096), and the schema does not pin a fixed size.
- Automatic deduplication of near-duplicate memories.
- Soft-delete support for safe memory removal.
- Both model-driven and user-driven memory creation.
- Unified workspace search via
x_workspace_searchsurfaces memories alongside documents and artifacts. - Background memory extraction triggered by specific dashboard events (deep_think, research, compaction) and by completed tool invocations.
How It Works
Chat memory operates through two complementary mechanisms: automatic (model-driven) and manual (user-driven).
Automatic Memory (Model-Driven)
During a conversation, the model can call two built-in tools to interact with the memory system:
-
x_save_memory, The model stores a piece of knowledge (a fact, preference, or instruction) into the workspace's memory. The content is embedded as a vector and persisted to the database. -
x_recall_memory, The model queries the workspace memory store with a natural language query and receives the top matching memories ranked by semantic relevance.
The model decides autonomously when to save or recall memories based on the conversation context.
Background Memory Extraction
In addition to explicit tool calls, two independent paths run distillation in the background. Neither uses a context-percent gate.
-
Tool-invocation auto-distill. Every completed server-tool
invocation enqueues a pending row that the memory extraction worker drains
asynchronously. Rows persisted by this path carry
source_type: "model"(the value written by thex_save_memorytool implementation when the worker decides a memory should be saved). -
Post-turn dashboard distill. When a streaming turn ends
with deep-think, research, or compaction output of at least 200
characters, the dashboard extracts candidate memories and persists them
with
source_typeset toauto_deep_think,auto_research, orauto_compactionrespectively (the prefix is literallyauto_followed by the source kind).
Note: The dashboard-side path emits an
x_memory.auto_saved SSE event when it persists a row. That
event is dashboard-only, it is not produced on the public
/v1/chat/completions or /v1/responses streams
that SDK consumers connect to. Programmatic clients should poll the
memory list endpoints instead.
Manual Memory (User-Driven)
Users can click the "Remember this" button on any message in the chat
interface. This creates a memory from the message content, embedding it
and storing it in the same memory store that the model accesses. Manual
memories are tagged with source_type: user to distinguish
them from model-created memories.
Passive Injection (Summary)
At the start of each turn, the system runs a semantic search against the workspace memory store with the current user message as the query and folds the top matches into the system prompt. No user or model action is required. The mechanics, thresholds, and ordering live in the Passive Injection section below.
Built-in Tools
x_save_memory
Stores a piece of knowledge in the workspace's memory. The content is embedded as a vector and persisted. Near-duplicate content is automatically deduplicated, if a memory with very high semantic similarity already exists, the save is skipped and the existing memory is returned.
| Parameter | Type | Required | Description |
|---|---|---|---|
| contentrequired | string | Yes | The text content to store as a memory. |
| categoryoptional | string | No | Free-form short label. Common values: preference, fact, instruction, decision. Arbitrary strings are accepted and stored verbatim; the system does not validate against an enum and does not infer a category when the field is omitted. |
Example Tool Call
{
"name": "x_save_memory",
"arguments": {
"content": "User prefers responses in bullet-point format",
"category": "preference"
}
}
x_recall_memory
Queries the workspace's memory store using semantic search. Returns up to
limit matching memories (default 5, max 20) ranked by relevance
score.
| Parameter | Type | Required | Description |
|---|---|---|---|
| queryrequired | string | Yes | Natural language query to search memories against. |
| offsetoptional | integer | No | Number of results to skip from the start of the ranked list. Used for paging through recall results. |
| limitoptional | integer | No | Maximum number of memories to return. Defaults to 5, capped at 20. |
Example Tool Call
{
"name": "x_recall_memory",
"arguments": {
"query": "What formatting preferences does the user have?"
}
}
Example Response (rendered markdown payload)
The recall tool returns a rendered markdown block to the model rather than a JSON object. Each entry exposes the memory id, content, category, and a relevance score; the source chat name is not included in the payload.
# Recalled memories
1. **mem_abc123** (preference, score 0.92)
User prefers responses in bullet-point format
2. **mem_def456** (instruction, score 0.85)
User wants concise answers, no longer than 3 paragraphs
Memory Management API
REST endpoints for managing memories programmatically within a single chat session. All endpoints operate within the scope of a single chat.
Dashboard-only surface. The
/chat/api/... routes documented in this section are served
by the dashboard frontend and authenticated with a browser session
cookie. They are not part of the OpenAI-compatible router API and do not
accept Authorization: Bearer sk-... credentials. An
unauthenticated request returns the login page as HTML, not JSON, so
clients that assume a JSON body will fail to parse the response.
For programmatic memory access from an SDK or API key, use the
x_save_memory and x_recall_memory tools through
/v1/chat/completions or /v1/responses, or call
the workspace memory routes through the router-proxied API surface.
Create Memory from Message
POST /chat/api/{chatId}/memory
Creates a new memory in the chat. At least one of messageId or
content must be supplied; the two fields are mutually optional
but cannot both be omitted. When the dashboard forwards the request to the
router it maps messageId onto the router-side
source_message_id field.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| messageIdoptional | string | Either this or content |
The ID of the message to derive the memory from. Forwarded to the router as source_message_id. |
| contentoptional | string | Either this or messageId |
Explicit text to persist. When omitted, the router derives the content from the referenced message. |
| categoryoptional | string | No | Free-form category label (see x_save_memory for common values). |
| synthesizeoptional | boolean | No | When true, the router-side persist path runs a synthesis pass over the source content before storing the memory. Defaults to false. |
curl
curl -X POST https://api.xerotier.ai/chat/api/chat_abc123/memory \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{
"messageId": "msg_abc123",
"content": "The project deadline is March 15, 2026"
}'
List Memories
GET /chat/api/{chatId}/memories
Returns active (non-deleted) memories whose source_chat_id
matches the path parameter, ordered by creation date descending. The
chat-scoped list returns the flat {memories: [...]} envelope
shown below and does not support cursor paging; for paginated workspace-wide
access use the workspace memory endpoints, which return an
{items, next_cursor, has_more} envelope instead.
curl
curl https://api.xerotier.ai/chat/api/chat_abc123/memories \
-H "Cookie: session=your_session_token"
Response
{
"memories": [
{
"id": "mem_abc123",
"content": "The project deadline is March 15, 2026",
"summary": "Project deadline is March 15",
"category": "fact",
"source_type": "user",
"created_at": "2026-03-07T10:30:00Z",
"updated_at": "2026-03-07T10:30:00Z"
}
]
}
Update Memory
PATCH /chat/api/{chatId}/memories/{memoryId}
Updates the content of an existing memory. The embedding is regenerated automatically to reflect the new content.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| contentrequired | string | Yes | The new text content for the memory. |
curl
curl -X PATCH https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123 \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{
"content": "The project deadline has been extended to April 1, 2026"
}'
Delete Memory
DELETE /chat/api/{chatId}/memories/{memoryId}
Soft-deletes a memory. The record remains in the database with a
deleted_at timestamp but is excluded from all queries
and semantic searches.
curl
curl -X DELETE https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123 \
-H "Cookie: session=your_session_token"
Returns HTTP 204 No Content on success.
Rerun Memory Extraction
POST /chat/api/{chatId}/memories/{memoryId}/rerun-extraction
Re-runs background extraction against the source message that produced the memory. The dashboard forwards the request to the router-side extraction handler, which regenerates the summary and (when enabled) the synthesised content for the existing memory row. Useful when the memory was first saved before a prompt or embedding-model change and the operator wants to refresh it without creating a new row.
curl -X POST https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123/rerun-extraction \
-H "Cookie: session=your_session_token"
Python (requests)
import requests
base = "https://api.xerotier.ai/chat/api"
chat_id = "chat_abc123"
cookies = {"session": "your_session_token"}
# Create a memory from a message
requests.post(f"{base}/{chat_id}/memory", cookies=cookies, json={
"messageId": "msg_abc123",
"content": "The project deadline is March 15, 2026"
})
# List all memories
memories = requests.get(f"{base}/{chat_id}/memories", cookies=cookies).json()
for mem in memories["memories"]:
print(f"[{mem['category']}] {mem['content']}")
# Update a memory
requests.patch(
f"{base}/{chat_id}/memories/mem_abc123",
cookies=cookies,
json={"content": "Deadline extended to April 1, 2026"}
)
# Delete a memory
requests.delete(f"{base}/{chat_id}/memories/mem_abc123", cookies=cookies)
Node.js (fetch)
const base = "https://api.xerotier.ai/chat/api";
const chatId = "chat_abc123";
const headers = {
"Cookie": "session=your_session_token",
"Content-Type": "application/json"
};
// Create a memory from a message
await fetch(`${base}/${chatId}/memory`, {
method: "POST",
headers,
body: JSON.stringify({
messageId: "msg_abc123",
content: "The project deadline is March 15, 2026"
})
});
// List all memories
const memRes = await fetch(`${base}/${chatId}/memories`, { headers });
const memories = await memRes.json();
memories.memories.forEach(mem =>
console.log(`[${mem.category}] ${mem.content}`)
);
// Update a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
method: "PATCH",
headers,
body: JSON.stringify({
content: "Deadline extended to April 1, 2026"
})
});
// Delete a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
method: "DELETE",
headers
});
Workspace Memory API
Memories can also be managed at the workspace level. Workspace-scoped memories
aggregate across all chats within a workspace and are included in unified
workspace search results returned by the x_workspace_search tool.
The default workspace automatically includes memories from chats that have no explicit workspace assignment, so no memories are orphaned.
Note: Workspace memory endpoints use cursor-based
pagination. Pass the cursor token from a previous response
as a query parameter to fetch the next page.
Create Workspace Memory
POST /chat/api/workspaces/{workspaceId}/memories
Creates a memory at the workspace scope without requiring a source chat. Useful for seeding workspace-wide knowledge or for ingesting memories from non-chat surfaces.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| contentrequired | string | Yes | The text content to persist as the memory. |
| categoryoptional | string | No | Free-form category label (see x_save_memory for common values). |
curl -X POST "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories" \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{"content": "Quarterly review meets every Thursday", "category": "fact"}'
List Workspace Memories
GET /chat/api/workspaces/{workspaceId}/memories
Returns non-deleted memories scoped to the workspace, sorted by creation
date descending. Accepts optional cursor and limit
query parameters for paging.
curl "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories?limit=20" \
-H "Cookie: session=your_session_token"
Response
{
"items": [
{
"id": "mem_abc123",
"content": "The project deadline is March 15, 2026",
"summary": "Project deadline is March 15",
"category": "fact",
"source_type": "user",
"created_at": "2026-03-07T10:30:00Z"
}
],
"next_cursor": "eyJkYXRlIjoiMjAyNi0wMy0wN...",
"has_more": true
}
Update Workspace Memory
PATCH /chat/api/workspaces/{workspaceId}/memories/{memoryId}
Updates the content of a memory within the workspace. The category
and source_type fields are not affected. Background-extracted
content is truncated to 2000 characters before persistence; the same cap is
a reasonable upper bound for client-supplied PATCH content.
curl -X PATCH "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories/mem_abc123" \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{"content": "Updated memory content"}'
Delete Workspace Memory
DELETE /chat/api/workspaces/{workspaceId}/memories/{memoryId}
Soft-deletes a memory from the workspace. Returns HTTP 204 No Content on success.
curl -X DELETE "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories/mem_abc123" \
-H "Cookie: session=your_session_token"
For full workspace management including workspace CRUD, file uploads, and unified search, see the Document Workspace documentation.
Passive Injection
Passive injection is the automatic process by which relevant memories are included in the model's context at the start of each turn. This requires no action from the user or the model, it happens transparently during context assembly.
How It Works
- The user sends a message in the chat.
-
During context assembly, the retriever runs a cosine-similarity search
(pgvector
<=>operator) against thechat_memoriestable for the workspace, using the current user message as the query. - Results below the configured similarity threshold are discarded. The retriever and analyst services apply this threshold during workspace context retrieval so that only contextually relevant memories survive.
- Surviving memories are folded into the workspace context that the analyst hands to the model. If nothing clears the threshold, no memory context is added.
The exact wording, ordering, and cap applied to the injected block are implementation details of the retriever / analyst pipeline and may change between releases. This page intentionally does not pin a literal prompt template or a fixed memory count.
Chat UI
The chat interface provides several UI elements for interacting with the memory system.
Remember This Button
Each message in the chat displays a "Remember this" button in its action bar. Clicking it creates a memory from the message content. The button provides visual feedback (a brief highlight) to confirm the memory was saved.
Memories Sidebar Panel
The toolbar includes a toggle button to open the Memories sidebar panel. This panel displays all active memories for the current workspace in a scrollable list.
- Each memory entry shows its content, category, source type, and creation date.
- Memories can be edited inline, click the edit icon, modify the text, and save.
- Memories can be deleted via the delete icon on each entry.
- A memory count badge on the toolbar toggle shows the total number of active memories.
Memory Count Badge
The toolbar toggle button displays a small badge showing the count of active memories for the workspace. The badge refreshes when the Memories sidebar is opened or after a memory create / delete action; exact update cadence depends on the dashboard build.
Data Model
Each memory belongs to a workspace and optionally tracks the source chat that created it. Memories may also be linked to the specific message that triggered their creation. All memories within a workspace are searchable from any chat assigned to that workspace.
Memory Object
| Field | Type | Description |
|---|---|---|
id |
string | Memory identifier in mem_xxx format (24 random alphanumeric characters after the prefix). |
content |
string | The text content of the memory. |
summary |
string (nullable) | Short description generated by the background-extraction paths (dashboard distill and the router-side memory extraction worker). Memories saved directly through the x_save_memory tool leave this field null. No persisted length cap is enforced; the extraction prompts simply target a brief description. |
category |
string (nullable) | Free-form category label written by the caller. Common values include preference, fact, instruction, and decision, but any string is accepted. |
source_type |
string | Origin of the row. Known values include model (saved via the x_save_memory tool, including writes performed by the router-side extraction worker), user (the "Remember this" button or an explicit dashboard create), and auto_deep_think / auto_research / auto_compaction (post-turn dashboard distill). The dashboard tags background-distilled rows with the literal auto_ prefix followed by the source kind. |
source_chat_id |
string (nullable) | ID of the chat session that produced this memory, if applicable. Used for provenance tracking in recall results. |
workspace_id |
string (nullable) | Workspace that scopes this memory. Memories are stored and searched at the workspace level, making them available to all chats in the workspace. |
created_at |
string | ISO 8601 timestamp when the memory was created. |
updated_at |
string | ISO 8601 timestamp when the memory was last updated. |
Embedding Details
Each memory is represented as a vector embedding. The embedding is model-agnostic: its dimension is resolved at runtime from the configured embeddings endpoint, so 768-, 1024-, 3072-, or 4096-dim embedders are all supported. The embedding is populated asynchronously after the memory record is saved.
Similarity Search
Memory retrieval uses cosine distance. Results are ranked with the most similar memory first (smallest cosine distance). A minimum similarity threshold is applied so that only contextually relevant memories are returned; memories below the threshold are excluded from context injection even if they are the closest matches.
Deduplication
Before saving a new memory, the system checks whether any existing memory in the same workspace has a cosine similarity above a high-confidence threshold. If a near-duplicate is found, the save is skipped and the existing memory is returned. This prevents the same fact from being stored multiple times with slightly different wording.
Embedding Model
Embeddings are generated using the embeddings endpoint configured for the project. The embedding model determines the quality of semantic search; models producing higher-dimensional or more semantically rich vectors will yield more accurate memory retrieval. If no embeddings endpoint is configured or available, memory operations that require embedding generation degrade gracefully and memory text is stored without a vector.