Chat Memory - Xerotier

Memories are scoped per-workspace, all chats within the same workspace share the same memory store. When memories are saved, they are embedded as vectors and stored for relevance-based semantic search. This means the system retrieves the most contextually relevant memories rather than relying on keyword matching or recency alone. Memories saved in one chat are automatically available to all other chats in the same workspace.

Memories can optionally be scoped to a workspace, which allows the x_workspace_search tool to surface them alongside document chunks and artifacts in a unified search. See Document Workspace for details on workspace-level memory management.

Key characteristics:

Per-workspace scoping for memory storage and retrieval.
Semantic vector search backed by pgvector. The embedding column is model-agnostic: its dimension is resolved at runtime from the configured embeddings endpoint (commonly 768, 1024, 3072, or 4096), and the schema does not pin a fixed size.
Automatic deduplication of near-duplicate memories.
Soft-delete support for safe memory removal.
Both model-driven and user-driven memory creation.
Unified workspace search via x_workspace_search surfaces memories alongside documents and artifacts.
Background memory extraction triggered by specific dashboard events (deep_think, research, compaction) and by completed tool invocations.

How It Works

Chat memory operates through two complementary mechanisms: automatic (model-driven) and manual (user-driven).

Automatic Memory (Model-Driven)

During a conversation, the model can call two built-in tools to interact with the memory system:

x_save_memory, The model stores a piece of knowledge (a fact, preference, or instruction) into the workspace's memory. The content is embedded as a vector and persisted to the database.
x_recall_memory, The model queries the workspace memory store with a natural language query and receives the top matching memories ranked by semantic relevance.

The model decides autonomously when to save or recall memories based on the conversation context.

Background Memory Extraction

In addition to explicit tool calls, two independent paths run distillation in the background. Neither uses a context-percent gate.

Tool-invocation auto-distill. Every completed server-tool invocation enqueues a pending row that the memory extraction worker drains asynchronously. Rows persisted by this path carry source_type: "model" (the value written by the x_save_memory tool implementation when the worker decides a memory should be saved).
Post-turn dashboard distill. When a streaming turn ends with deep-think, research, or compaction output of at least 200 characters, the dashboard extracts candidate memories and persists them with source_type set to auto_deep_think, auto_research, or auto_compaction respectively (the prefix is literally auto_ followed by the source kind).

Note: The dashboard-side path emits an x_memory.auto_saved SSE event when it persists a row. That event is dashboard-only, it is not produced on the public /v1/chat/completions or /v1/responses streams that SDK consumers connect to. Programmatic clients should poll the memory list endpoints instead.

Manual Memory (User-Driven)

Users can click the "Remember this" button on any message in the chat interface. This creates a memory from the message content, embedding it and storing it in the same memory store that the model accesses. Manual memories are tagged with source_type: user to distinguish them from model-created memories.

Passive Injection (Summary)

At the start of each turn, the system runs a semantic search against the workspace memory store with the current user message as the query and folds the top matches into the system prompt. No user or model action is required. The mechanics, thresholds, and ordering live in the Passive Injection section below.

Built-in Tools

x_save_memory

Stores a piece of knowledge in the workspace's memory. The content is embedded as a vector and persisted. Near-duplicate content is automatically deduplicated, if a memory with very high semantic similarity already exists, the save is skipped and the existing memory is returned.

Parameter	Type	Required	Description
contentrequired	string	Yes	The text content to store as a memory.
categoryoptional	string	No	Free-form short label. Common values: `preference`, `fact`, `instruction`, `decision`. Arbitrary strings are accepted and stored verbatim; the system does not validate against an enum and does not infer a category when the field is omitted.

Example Tool Call

JSON

                    {
  "name": "x_save_memory",
  "arguments": {
    "content": "User prefers responses in bullet-point format",
    "category": "preference"
  }
}
                

x_recall_memory

Queries the workspace's memory store using semantic search. Returns up to limit matching memories (default 5, max 20) ranked by relevance score.

Parameter	Type	Required	Description
queryrequired	string	Yes	Natural language query to search memories against.
offsetoptional	integer	No	Number of results to skip from the start of the ranked list. Used for paging through recall results.
limitoptional	integer	No	Maximum number of memories to return. Defaults to 5, capped at 20.

Example Tool Call

JSON

                    {
  "name": "x_recall_memory",
  "arguments": {
    "query": "What formatting preferences does the user have?"
  }
}
                

Example Response (rendered markdown payload)

The recall tool returns a rendered markdown block to the model rather than a JSON object. Each entry exposes the memory id, content, category, and a relevance score; the source chat name is not included in the payload.

Markdown

                    # Recalled memories

1. **mem_abc123** (preference, score 0.92)
   User prefers responses in bullet-point format

2. **mem_def456** (instruction, score 0.85)
   User wants concise answers, no longer than 3 paragraphs
                

Memory Management API

REST endpoints for managing memories programmatically within a single chat session. All endpoints operate within the scope of a single chat.

Dashboard-only surface. The /chat/api/... routes documented in this section are served by the dashboard frontend and authenticated with a browser session cookie. They are not part of the OpenAI-compatible router API and do not accept Authorization: Bearer sk-... credentials. An unauthenticated request returns the login page as HTML, not JSON, so clients that assume a JSON body will fail to parse the response. For programmatic memory access from an SDK or API key, use the x_save_memory and x_recall_memory tools through /v1/chat/completions or /v1/responses, or call the workspace memory routes through the router-proxied API surface.

Create Memory from Message

POST /chat/api/{chatId}/memory

Creates a new memory in the chat. At least one of messageId or content must be supplied; the two fields are mutually optional but cannot both be omitted. When the dashboard forwards the request to the router it maps messageId onto the router-side source_message_id field.

Request Body

Field	Type	Required	Description
messageIdoptional	string	Either this or `content`	The ID of the message to derive the memory from. Forwarded to the router as `source_message_id`.
contentoptional	string	Either this or `messageId`	Explicit text to persist. When omitted, the router derives the content from the referenced message.
categoryoptional	string	No	Free-form category label (see `x_save_memory` for common values).
synthesizeoptional	boolean	No	When `true`, the router-side persist path runs a synthesis pass over the source content before storing the memory. Defaults to `false`.

curl

                    curl -X POST https://api.xerotier.ai/chat/api/chat_abc123/memory \
  -H "Cookie: session=your_session_token" \
  -H "Content-Type: application/json" \
  -d '{
    "messageId": "msg_abc123",
    "content": "The project deadline is March 15, 2026"
  }'
                

List Memories

GET /chat/api/{chatId}/memories

Returns active (non-deleted) memories whose source_chat_id matches the path parameter, ordered by creation date descending. The chat-scoped list returns the flat {memories: [...]} envelope shown below and does not support cursor paging; for paginated workspace-wide access use the workspace memory endpoints, which return an {items, next_cursor, has_more} envelope instead.

curl

                    curl https://api.xerotier.ai/chat/api/chat_abc123/memories \
  -H "Cookie: session=your_session_token"
                

Response

JSON

                    {
  "memories": [
    {
      "id": "mem_abc123",
      "content": "The project deadline is March 15, 2026",
      "summary": "Project deadline is March 15",
      "category": "fact",
      "source_type": "user",
      "created_at": "2026-03-07T10:30:00Z",
      "updated_at": "2026-03-07T10:30:00Z"
    }
  ]
}
                

Update Memory

PATCH /chat/api/{chatId}/memories/{memoryId}

Updates the content of an existing memory. The embedding is regenerated automatically to reflect the new content.

Request Body

Field	Type	Required	Description
contentrequired	string	Yes	The new text content for the memory.

curl

                    curl -X PATCH https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123 \
  -H "Cookie: session=your_session_token" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The project deadline has been extended to April 1, 2026"
  }'
                

Delete Memory

DELETE /chat/api/{chatId}/memories/{memoryId}

Soft-deletes a memory. The record remains in the database with a deleted_at timestamp but is excluded from all queries and semantic searches.

curl

                    curl -X DELETE https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123 \
  -H "Cookie: session=your_session_token"
                

Returns HTTP 204 No Content on success.

Rerun Memory Extraction

POST /chat/api/{chatId}/memories/{memoryId}/rerun-extraction

Re-runs background extraction against the source message that produced the memory. The dashboard forwards the request to the router-side extraction handler, which regenerates the summary and (when enabled) the synthesised content for the existing memory row. Useful when the memory was first saved before a prompt or embedding-model change and the operator wants to refresh it without creating a new row.

curl

                    curl -X POST https://api.xerotier.ai/chat/api/chat_abc123/memories/mem_abc123/rerun-extraction \
  -H "Cookie: session=your_session_token"
                

Python (requests)

Python

                    import requests

base = "https://api.xerotier.ai/chat/api"
chat_id = "chat_abc123"
cookies = {"session": "your_session_token"}

# Create a memory from a message
requests.post(f"{base}/{chat_id}/memory", cookies=cookies, json={
    "messageId": "msg_abc123",
    "content": "The project deadline is March 15, 2026"
})

# List all memories
memories = requests.get(f"{base}/{chat_id}/memories", cookies=cookies).json()
for mem in memories["memories"]:
    print(f"[{mem['category']}] {mem['content']}")

# Update a memory
requests.patch(
    f"{base}/{chat_id}/memories/mem_abc123",
    cookies=cookies,
    json={"content": "Deadline extended to April 1, 2026"}
)

# Delete a memory
requests.delete(f"{base}/{chat_id}/memories/mem_abc123", cookies=cookies)
                

Node.js (fetch)

JavaScript

                    const base = "https://api.xerotier.ai/chat/api";
const chatId = "chat_abc123";
const headers = {
    "Cookie": "session=your_session_token",
    "Content-Type": "application/json"
};

// Create a memory from a message
await fetch(`${base}/${chatId}/memory`, {
    method: "POST",
    headers,
    body: JSON.stringify({
        messageId: "msg_abc123",
        content: "The project deadline is March 15, 2026"
    })
});

// List all memories
const memRes = await fetch(`${base}/${chatId}/memories`, { headers });
const memories = await memRes.json();
memories.memories.forEach(mem =>
    console.log(`[${mem.category}] ${mem.content}`)
);

// Update a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
    method: "PATCH",
    headers,
    body: JSON.stringify({
        content: "Deadline extended to April 1, 2026"
    })
});

// Delete a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
    method: "DELETE",
    headers
});
                

Workspace Memory API

Memories can also be managed at the workspace level. Workspace-scoped memories aggregate across all chats within a workspace and are included in unified workspace search results returned by the x_workspace_search tool.

The default workspace automatically includes memories from chats that have no explicit workspace assignment, so no memories are orphaned.

Note: Workspace memory endpoints use cursor-based pagination. Pass the cursor token from a previous response as a query parameter to fetch the next page.

Create Workspace Memory

POST /chat/api/workspaces/{workspaceId}/memories

Creates a memory at the workspace scope without requiring a source chat. Useful for seeding workspace-wide knowledge or for ingesting memories from non-chat surfaces.

Request Body

Field	Type	Required	Description
contentrequired	string	Yes	The text content to persist as the memory.
categoryoptional	string	No	Free-form category label (see `x_save_memory` for common values).

curl

                    curl -X POST "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories" \
  -H "Cookie: session=your_session_token" \
  -H "Content-Type: application/json" \
  -d '{"content": "Quarterly review meets every Thursday", "category": "fact"}'
                

List Workspace Memories

GET /chat/api/workspaces/{workspaceId}/memories

Returns non-deleted memories scoped to the workspace, sorted by creation date descending. Accepts optional cursor and limit query parameters for paging.

curl

                    curl "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories?limit=20" \
  -H "Cookie: session=your_session_token"
                

Response

JSON

                    {
  "items": [
    {
      "id": "mem_abc123",
      "content": "The project deadline is March 15, 2026",
      "summary": "Project deadline is March 15",
      "category": "fact",
      "source_type": "user",
      "created_at": "2026-03-07T10:30:00Z"
    }
  ],
  "next_cursor": "eyJkYXRlIjoiMjAyNi0wMy0wN...",
  "has_more": true
}
                

Update Workspace Memory

PATCH /chat/api/workspaces/{workspaceId}/memories/{memoryId}

Updates the content of a memory within the workspace. The category and source_type fields are not affected. Background-extracted content is truncated to 2000 characters before persistence; the same cap is a reasonable upper bound for client-supplied PATCH content.

curl

                    curl -X PATCH "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories/mem_abc123" \
  -H "Cookie: session=your_session_token" \
  -H "Content-Type: application/json" \
  -d '{"content": "Updated memory content"}'
                

Delete Workspace Memory

DELETE /chat/api/workspaces/{workspaceId}/memories/{memoryId}

Soft-deletes a memory from the workspace. Returns HTTP 204 No Content on success.

curl

                    curl -X DELETE "https://api.xerotier.ai/chat/api/workspaces/ws_abc123/memories/mem_abc123" \
  -H "Cookie: session=your_session_token"
                

For full workspace management including workspace CRUD, file uploads, and unified search, see the Document Workspace documentation.

Passive Injection

Passive injection is the automatic process by which relevant memories are included in the model's context at the start of each turn. This requires no action from the user or the model, it happens transparently during context assembly.

How It Works

The user sends a message in the chat.
During context assembly, the retriever runs a cosine-similarity search (pgvector <=> operator) against the chat_memories table for the workspace, using the current user message as the query.
Results below the configured similarity threshold are discarded. The retriever and analyst services apply this threshold during workspace context retrieval so that only contextually relevant memories survive.
Surviving memories are folded into the workspace context that the analyst hands to the model. If nothing clears the threshold, no memory context is added.

The exact wording, ordering, and cap applied to the injected block are implementation details of the retriever / analyst pipeline and may change between releases. This page intentionally does not pin a literal prompt template or a fixed memory count.

Chat UI

The chat interface provides several UI elements for interacting with the memory system.

Remember This Button

Each message in the chat displays a "Remember this" button in its action bar. Clicking it creates a memory from the message content. The button provides visual feedback (a brief highlight) to confirm the memory was saved.

Memories Sidebar Panel

The toolbar includes a toggle button to open the Memories sidebar panel. This panel displays all active memories for the current workspace in a scrollable list.

Each memory entry shows its content, category, source type, and creation date.
Memories can be edited inline, click the edit icon, modify the text, and save.
Memories can be deleted via the delete icon on each entry.
A memory count badge on the toolbar toggle shows the total number of active memories.

Memory Count Badge

The toolbar toggle button displays a small badge showing the count of active memories for the workspace. The badge refreshes when the Memories sidebar is opened or after a memory create / delete action; exact update cadence depends on the dashboard build.

Data Model

Each memory belongs to a workspace and optionally tracks the source chat that created it. Memories may also be linked to the specific message that triggered their creation. All memories within a workspace are searchable from any chat assigned to that workspace.

Memory Object

Field	Type	Description
`id`	string	Memory identifier in `mem_xxx` format (24 random alphanumeric characters after the prefix).
`content`	string	The text content of the memory.
`summary`	string (nullable)	Short description generated by the background-extraction paths (dashboard distill and the router-side memory extraction worker). Memories saved directly through the `x_save_memory` tool leave this field `null`. No persisted length cap is enforced; the extraction prompts simply target a brief description.
`category`	string (nullable)	Free-form category label written by the caller. Common values include `preference`, `fact`, `instruction`, and `decision`, but any string is accepted.
`source_type`	string	Origin of the row. Known values include `model` (saved via the `x_save_memory` tool, including writes performed by the router-side extraction worker), `user` (the "Remember this" button or an explicit dashboard create), and `auto_deep_think` / `auto_research` / `auto_compaction` (post-turn dashboard distill). The dashboard tags background-distilled rows with the literal `auto_` prefix followed by the source kind.
`source_chat_id`	string (nullable)	ID of the chat session that produced this memory, if applicable. Used for provenance tracking in recall results.
`workspace_id`	string (nullable)	Workspace that scopes this memory. Memories are stored and searched at the workspace level, making them available to all chats in the workspace.
`created_at`	string	ISO 8601 timestamp when the memory was created.
`updated_at`	string	ISO 8601 timestamp when the memory was last updated.

Embedding Details

Each memory is represented as a vector embedding. The embedding is model-agnostic: its dimension is resolved at runtime from the configured embeddings endpoint, so 768-, 1024-, 3072-, or 4096-dim embedders are all supported. The embedding is populated asynchronously after the memory record is saved.

Similarity Search

Memory retrieval uses cosine distance. Results are ranked with the most similar memory first (smallest cosine distance). A minimum similarity threshold is applied so that only contextually relevant memories are returned; memories below the threshold are excluded from context injection even if they are the closest matches.

Deduplication

Before saving a new memory, the system checks whether any existing memory in the same workspace has a cosine similarity above a high-confidence threshold. If a near-duplicate is found, the save is skipped and the existing memory is returned. This prevents the same fact from being stored multiple times with slightly different wording.

Embedding Model

Embeddings are generated using the embeddings endpoint configured for the project. The embedding model determines the quality of semantic search; models producing higher-dimensional or more semantically rich vectors will yield more accurate memory retrieval. If no embeddings endpoint is configured or available, memory operations that require embedding generation degrade gracefully and memory text is stored without a vector.