Chat Memory
Persistent per-chat knowledge with semantic recall and automatic context injection.
Overview
Chat memory allows conversations to accumulate knowledge over time. The model can save and recall facts, preferences, and instructions as the conversation progresses. Users can also manually create memories from any message in the chat interface.
Memories are scoped per-chat -- each conversation maintains its own isolated memory store. When memories are saved, they are embedded as vectors and stored for relevance-based semantic search. This means the system retrieves the most contextually relevant memories rather than relying on keyword matching or recency alone.
Key characteristics:
- Per-chat scoping -- memories do not leak between conversations.
- Semantic vector search for relevance-based recall.
- Automatic deduplication of near-duplicate memories.
- Soft-delete support for safe memory removal.
- Both model-driven and user-driven memory creation.
How It Works
Chat memory operates through two complementary mechanisms: automatic (model-driven) and manual (user-driven).
Automatic Memory (Model-Driven)
During a conversation, the model can call two built-in tools to interact with the memory system:
-
save_memory-- The model stores a piece of knowledge (a fact, preference, or instruction) into the chat's memory. The content is embedded as a vector and persisted to the database. -
recall_memory-- The model queries the memory store with a natural language query and receives the top matching memories ranked by semantic relevance.
The model decides autonomously when to save or recall memories based on the conversation context.
Manual Memory (User-Driven)
Users can click the "Remember this" button on any message in the chat
interface. This creates a memory from the message content, embedding it
and storing it in the same memory store that the model accesses. Manual
memories are tagged with source_type: user to distinguish
them from model-created memories.
Passive Injection
At the start of each turn, the system automatically performs a semantic search against the chat's memory store using the current user message as the query. The top relevant memories are injected into the system prompt as contextual knowledge. This happens transparently -- no user or model action is required. See the Passive Injection section for details.
Built-in Tools
save_memory
Stores a piece of knowledge in the chat's memory. The content is embedded as a vector and persisted. Near-duplicate content is automatically deduplicated -- if a memory with very high semantic similarity already exists, the save is skipped and the existing memory is returned.
| Parameter | Type | Required | Description |
|---|---|---|---|
| contentrequired | string | Yes | The text content to store as a memory. |
| categoryoptional | string | No | Category for the memory. One of: preference, fact, instruction. When omitted, the system infers the category from the content. |
Example Tool Call
{
"name": "save_memory",
"arguments": {
"content": "User prefers responses in bullet-point format",
"category": "preference"
}
}
recall_memory
Queries the chat's memory store using semantic search. Returns the top 5 matching memories ranked by relevance score.
| Parameter | Type | Required | Description |
|---|---|---|---|
| queryrequired | string | Yes | Natural language query to search memories against. |
Example Tool Call
{
"name": "recall_memory",
"arguments": {
"query": "What formatting preferences does the user have?"
}
}
Example Response
{
"memories": [
{
"id": "mem_abc123",
"content": "User prefers responses in bullet-point format",
"category": "preference",
"relevance": 0.92,
"created_at": "2026-03-07T10:30:00Z"
},
{
"id": "mem_def456",
"content": "User wants concise answers, no longer than 3 paragraphs",
"category": "instruction",
"relevance": 0.85,
"created_at": "2026-03-07T10:25:00Z"
}
]
}
Memory Management API
REST endpoints for managing memories programmatically. All endpoints require authentication and operate within the scope of a single chat.
Create Memory from Message
POST /chat/api/{chatId}/memory
Creates a new memory from an existing message in the chat.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| messageIdrequired | string | Yes | The ID of the message to create a memory from. |
| contentrequired | string | Yes | The text content to store as the memory. |
curl
curl -X POST https://api.xerotier.ai/chat/api/550e8400-e29b-41d4-a716-446655440000/memory \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{
"messageId": "msg_abc123",
"content": "The project deadline is March 15, 2026"
}'
List Memories
GET /chat/api/{chatId}/memories
Returns all active (non-deleted) memories for the specified chat, ordered by creation date descending.
curl
curl https://api.xerotier.ai/chat/api/550e8400-e29b-41d4-a716-446655440000/memories \
-H "Cookie: session=your_session_token"
Response
{
"memories": [
{
"id": "mem_abc123",
"content": "The project deadline is March 15, 2026",
"category": "fact",
"source_type": "user",
"created_at": "2026-03-07T10:30:00Z",
"updated_at": "2026-03-07T10:30:00Z"
}
]
}
Update Memory
PATCH /chat/api/{chatId}/memories/{memoryId}
Updates the content of an existing memory. The embedding is regenerated automatically to reflect the new content.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| contentrequired | string | Yes | The new text content for the memory. |
curl
curl -X PATCH https://api.xerotier.ai/chat/api/550e8400-e29b-41d4-a716-446655440000/memories/mem_abc123 \
-H "Cookie: session=your_session_token" \
-H "Content-Type: application/json" \
-d '{
"content": "The project deadline has been extended to April 1, 2026"
}'
Delete Memory
DELETE /chat/api/{chatId}/memories/{memoryId}
Soft-deletes a memory. The record remains in the database with a
deleted_at timestamp but is excluded from all queries
and semantic searches.
curl
curl -X DELETE https://api.xerotier.ai/chat/api/550e8400-e29b-41d4-a716-446655440000/memories/mem_abc123 \
-H "Cookie: session=your_session_token"
Returns HTTP 204 No Content on success.
Python (requests)
import requests
base = "https://api.xerotier.ai/chat/api"
chat_id = "550e8400-e29b-41d4-a716-446655440000"
cookies = {"session": "your_session_token"}
# Create a memory from a message
requests.post(f"{base}/{chat_id}/memory", cookies=cookies, json={
"messageId": "msg_abc123",
"content": "The project deadline is March 15, 2026"
})
# List all memories
memories = requests.get(f"{base}/{chat_id}/memories", cookies=cookies).json()
for mem in memories["memories"]:
print(f"[{mem['category']}] {mem['content']}")
# Update a memory
requests.patch(
f"{base}/{chat_id}/memories/mem_abc123",
cookies=cookies,
json={"content": "Deadline extended to April 1, 2026"}
)
# Delete a memory
requests.delete(f"{base}/{chat_id}/memories/mem_abc123", cookies=cookies)
Node.js (fetch)
const base = "https://api.xerotier.ai/chat/api";
const chatId = "550e8400-e29b-41d4-a716-446655440000";
const headers = {
"Cookie": "session=your_session_token",
"Content-Type": "application/json"
};
// Create a memory from a message
await fetch(`${base}/${chatId}/memory`, {
method: "POST",
headers,
body: JSON.stringify({
messageId: "msg_abc123",
content: "The project deadline is March 15, 2026"
})
});
// List all memories
const memRes = await fetch(`${base}/${chatId}/memories`, { headers });
const memories = await memRes.json();
memories.memories.forEach(mem =>
console.log(`[${mem.category}] ${mem.content}`)
);
// Update a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
method: "PATCH",
headers,
body: JSON.stringify({
content: "Deadline extended to April 1, 2026"
})
});
// Delete a memory
await fetch(`${base}/${chatId}/memories/mem_abc123`, {
method: "DELETE",
headers
});
Passive Injection
Passive injection is the automatic process by which relevant memories are included in the model's context at the start of each turn. This requires no action from the user or the model -- it happens transparently during context assembly.
How It Works
- The user sends a message in the chat.
-
During context assembly, the system takes the current user message
and runs a semantic search (cosine similarity) against the
chat_memoriestable for that chat. - The top 3-5 most relevant memories (above a minimum similarity threshold) are selected.
- These memories are injected into the system prompt as a "Known facts about this conversation" section, formatted as a numbered list.
- The model receives the enriched context and can use the injected memories to produce more informed responses.
Injected Context Example
The following shows how injected memories appear in the system prompt that the model receives:
Known facts about this conversation:
1. User prefers responses in bullet-point format (preference)
2. The project deadline is March 15, 2026 (fact)
3. Always include code examples in Python (instruction)
The number of injected memories varies between 3 and 5 depending on how many memories exceed the relevance threshold. If no memories are sufficiently relevant, the section is omitted entirely.
Chat UI
The chat interface provides several UI elements for interacting with the memory system.
Remember This Button
Each message in the chat displays a "Remember this" button in its action bar. Clicking it creates a memory from the message content. The button provides visual feedback (a brief highlight) to confirm the memory was saved.
Memories Sidebar Panel
The toolbar includes a toggle button to open the Memories sidebar panel. This panel displays all active memories for the current chat in a scrollable list.
- Each memory entry shows its content, category, source type, and creation date.
- Memories can be edited inline -- click the edit icon, modify the text, and save.
- Memories can be deleted via the delete icon on each entry.
- A memory count badge on the toolbar toggle shows the total number of active memories.
Memory Count Badge
The toolbar toggle button displays a small badge showing the current count of active memories for the chat. This updates in real time as memories are created or deleted, whether by the model or the user.
Data Model
Each memory is associated with a single chat and optionally linked to the message that created it.
Memory Object
| Field | Type | Description |
|---|---|---|
id |
string | Memory identifier in mem_xxx format. |
content |
string | The text content of the memory. |
category |
string (nullable) | Optional category: preference, fact, or instruction. |
source_type |
string | How the memory was created: model (via save_memory tool) or user (via "Remember this" button). |
created_at |
string | ISO 8601 timestamp when the memory was created. |
updated_at |
string | ISO 8601 timestamp when the memory was last updated. |