xeroctl chat

Send chat completion requests to a Xerotier endpoint. Supports single-shot, streaming, and interactive multi-turn modes. See the xeroctl CLI hub for installation and global options.

Overview

The chat command tests chat completions against any endpoint configured in your project. It wraps the OpenAI-compatible /v1/chat/completions API and supports all standard sampling parameters.

An endpoint slug (--endpoint) is always required. A user message (--message or -m) is required unless --interactive is specified.

Note: The --model flag is informational pass-through. The actual model used is determined by the endpoint configuration on the server.

Single-Shot Mode

Send a single message and receive a complete response:

bash
xeroctl chat --endpoint my-endpoint --message "Hello, how are you?"

With a System Message

bash
xeroctl chat --endpoint my-endpoint \ --system "You are a helpful coding assistant." \ --message "Write a Python function to reverse a string"

System Prompt From a File

Load the system prompt from a file using --from-file:

bash
xeroctl chat --endpoint my-endpoint \ --from-file ./system-prompt.txt \ --message "What is the policy on refunds?"

With Sampling Parameters

bash
xeroctl chat --endpoint my-endpoint \ --message "Generate three creative product names" \ --max-tokens 200 \ --temperature 0.9 \ --top-p 0.95

Show Token Usage

Append --show-usage to print token counts and the response ID after the output:

bash
xeroctl chat --endpoint my-endpoint \ --message "Summarize quantum entanglement" \ --show-usage

Store the Completion

Pass --store to request server-side storage so the completion can be retrieved later via xeroctl completions:

bash
xeroctl chat --endpoint my-endpoint \ --message "Draft a release announcement" \ --store

JSON Output

Use -o json to receive the raw API response object:

bash
xeroctl chat --endpoint my-endpoint --message "Hello" -o json

Streaming Mode

Add --stream to receive tokens as they are generated. A spinner appears while waiting for the first token. A throughput summary is printed after the response completes.

bash
xeroctl chat --endpoint my-endpoint \ --message "Tell me a story about a robot learning to paint" \ --stream

Example output footer (printed to stderr, dim):

Output
-- 312 tokens in 4.2s (74.3 tok/s)

Combine --stream with --show-usage to additionally print prompt/completion token breakdowns:

bash
xeroctl chat --endpoint my-endpoint \ --message "Explain transformers" \ --stream \ --show-usage

Interactive Mode

Use --interactive to start a multi-turn chat session. The full conversation history is maintained client-side and sent with each request. All streaming sampling parameters apply. Press Ctrl+D or type /quit to end the session.

bash
xeroctl chat --endpoint my-endpoint --interactive

With a Preset System Message

bash
xeroctl chat --endpoint my-endpoint \ --interactive \ --system "You are a SQL expert. Keep answers concise."

With a System Prompt File

bash
xeroctl chat --endpoint my-endpoint \ --interactive \ --from-file ./persona.txt

Interactive mode always streams responses. A status bar at the prompt shows the endpoint slug, optional model name, cumulative token counts, and message count:

Status bar example
endpoint: my-endpoint | model: llama-3 | tokens: 148/532 | msgs: 6

The line editor supports history (Up/Down arrows), tab-completion for slash commands, and standard readline shortcuts.

All Options

Option Type Description
--endpoint <slug>required string Endpoint slug to send the request to.
-m, --message <text> string User message to send. Required unless --interactive is specified.
--system <text> string System message to set context for the model.
--from-file <path> string Read the system prompt from a file. Takes precedence over --system when both are specified.
--model <name> string Model name passed through to the API. Informational only -- the endpoint configuration determines the actual model used.
--max-tokens <n> integer Maximum number of tokens to generate. Defaults to 4096 in interactive mode when not specified.
--temperature <f> float Sampling temperature between 0.0 and 2.0. Higher values produce more varied output.
--top-p <f> float Nucleus sampling threshold between 0.0 and 1.0.
--frequency-penalty <f> float Frequency penalty between -2.0 and 2.0. Reduces repetition of already-used tokens.
--presence-penalty <f> float Presence penalty between -2.0 and 2.0. Encourages the model to discuss new topics.
--seed <n> integer Random seed for reproducible output.
--stream flag Stream the response token by token. A throughput summary is printed after completion.
--store flag Request server-side storage of the completion for later retrieval.
--show-usage flag Print detailed token usage (prompt, completion, total, cached, reasoning) after the response.
--interactive flag Start an interactive multi-turn chat session. Conversation history is maintained client-side.

Interactive Commands

The following slash commands are available during an interactive session:

Command Description
/help Show available commands and keyboard shortcuts.
/clear Clear conversation history while keeping the system message.
/system <msg> Set or replace the system message mid-session.
/save <file> Save the current conversation history to a JSON file.
/load <file> Load a previously saved conversation from a JSON file.
/tokens Show cumulative token usage and message count for the session.
/quit End the session and print total token usage. Also triggered by Ctrl+D.

Keyboard Shortcuts

Shortcut Action
Up / Down Recall previous inputs from history.
Tab Auto-complete slash commands.
Ctrl+A / Ctrl+E Jump to start or end of line.
Ctrl+U / Ctrl+K Clear text before or after the cursor.
Ctrl+W Delete the previous word.
Ctrl+D End the session (EOF).

Examples

Quick Endpoint Smoke Test

bash
xeroctl chat --endpoint my-endpoint --message "What is 2+2?"

Deterministic Output With a Seed

bash
xeroctl chat --endpoint my-endpoint \ --message "Pick a random number between 1 and 10" \ --seed 42 \ --temperature 0.0

Stream a Long Generation

bash
xeroctl chat --endpoint my-endpoint \ --message "Write a 500-word essay on renewable energy" \ --max-tokens 700 \ --stream

Batch-Test Multiple Endpoints

bash
#!/bin/bash for ep in endpoint-prod endpoint-staging endpoint-dev; do echo "=== $ep ===" xeroctl chat --endpoint "$ep" \ --message "Respond with the word OK only." \ --show-usage done

Use a Persona File for Interactive Sessions

bash
echo "You are a terse senior engineer. Answer only in bullet points." > persona.txt xeroctl chat --endpoint my-endpoint --interactive --from-file persona.txt