xeroctl chat
Send chat completion requests to a Xerotier endpoint. Supports single-shot, streaming, and interactive multi-turn modes. See the xeroctl CLI hub for installation and global options.
Overview
The chat command tests chat completions against any endpoint configured
in your project. It wraps the OpenAI-compatible /v1/chat/completions API
and supports all standard sampling parameters.
An endpoint slug (--endpoint) is always required. A user message
(--message or -m) is required unless
--interactive is specified.
Note: The --model flag is informational pass-through. The actual model
used is determined by the endpoint configuration on the server.
Single-Shot Mode
Send a single message and receive a complete response:
xeroctl chat --endpoint my-endpoint --message "Hello, how are you?"
With a System Message
xeroctl chat --endpoint my-endpoint \
--system "You are a helpful coding assistant." \
--message "Write a Python function to reverse a string"
System Prompt From a File
Load the system prompt from a file using --from-file:
xeroctl chat --endpoint my-endpoint \
--from-file ./system-prompt.txt \
--message "What is the policy on refunds?"
With Sampling Parameters
xeroctl chat --endpoint my-endpoint \
--message "Generate three creative product names" \
--max-tokens 200 \
--temperature 0.9 \
--top-p 0.95
Show Token Usage
Append --show-usage to print token counts and the response ID after the output:
xeroctl chat --endpoint my-endpoint \
--message "Summarize quantum entanglement" \
--show-usage
Store the Completion
Pass --store to request server-side storage so the completion can be retrieved later via xeroctl completions:
xeroctl chat --endpoint my-endpoint \
--message "Draft a release announcement" \
--store
JSON Output
Use -o json to receive the raw API response object:
xeroctl chat --endpoint my-endpoint --message "Hello" -o json
Streaming Mode
Add --stream to receive tokens as they are generated. A spinner
appears while waiting for the first token. A throughput summary is printed
after the response completes.
xeroctl chat --endpoint my-endpoint \
--message "Tell me a story about a robot learning to paint" \
--stream
Example output footer (printed to stderr, dim):
-- 312 tokens in 4.2s (74.3 tok/s)
Combine --stream with --show-usage to additionally print prompt/completion token breakdowns:
xeroctl chat --endpoint my-endpoint \
--message "Explain transformers" \
--stream \
--show-usage
Interactive Mode
Use --interactive to start a multi-turn chat session. The full
conversation history is maintained client-side and sent with each request.
All streaming sampling parameters apply. Press Ctrl+D or type
/quit to end the session.
xeroctl chat --endpoint my-endpoint --interactive
With a Preset System Message
xeroctl chat --endpoint my-endpoint \
--interactive \
--system "You are a SQL expert. Keep answers concise."
With a System Prompt File
xeroctl chat --endpoint my-endpoint \
--interactive \
--from-file ./persona.txt
Interactive mode always streams responses. A status bar at the prompt shows the endpoint slug, optional model name, cumulative token counts, and message count:
endpoint: my-endpoint | model: llama-3 | tokens: 148/532 | msgs: 6
The line editor supports history (Up/Down arrows), tab-completion for slash commands, and standard readline shortcuts.
All Options
| Option | Type | Description |
|---|---|---|
--endpoint <slug>required |
string | Endpoint slug to send the request to. |
-m, --message <text> |
string | User message to send. Required unless --interactive is specified. |
--system <text> |
string | System message to set context for the model. |
--from-file <path> |
string | Read the system prompt from a file. Takes precedence over --system when both are specified. |
--model <name> |
string | Model name passed through to the API. Informational only -- the endpoint configuration determines the actual model used. |
--max-tokens <n> |
integer | Maximum number of tokens to generate. Defaults to 4096 in interactive mode when not specified. |
--temperature <f> |
float | Sampling temperature between 0.0 and 2.0. Higher values produce more varied output. |
--top-p <f> |
float | Nucleus sampling threshold between 0.0 and 1.0. |
--frequency-penalty <f> |
float | Frequency penalty between -2.0 and 2.0. Reduces repetition of already-used tokens. |
--presence-penalty <f> |
float | Presence penalty between -2.0 and 2.0. Encourages the model to discuss new topics. |
--seed <n> |
integer | Random seed for reproducible output. |
--stream |
flag | Stream the response token by token. A throughput summary is printed after completion. |
--store |
flag | Request server-side storage of the completion for later retrieval. |
--show-usage |
flag | Print detailed token usage (prompt, completion, total, cached, reasoning) after the response. |
--interactive |
flag | Start an interactive multi-turn chat session. Conversation history is maintained client-side. |
Interactive Commands
The following slash commands are available during an interactive session:
| Command | Description |
|---|---|
/help |
Show available commands and keyboard shortcuts. |
/clear |
Clear conversation history while keeping the system message. |
/system <msg> |
Set or replace the system message mid-session. |
/save <file> |
Save the current conversation history to a JSON file. |
/load <file> |
Load a previously saved conversation from a JSON file. |
/tokens |
Show cumulative token usage and message count for the session. |
/quit |
End the session and print total token usage. Also triggered by Ctrl+D. |
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Up / Down | Recall previous inputs from history. |
| Tab | Auto-complete slash commands. |
| Ctrl+A / Ctrl+E | Jump to start or end of line. |
| Ctrl+U / Ctrl+K | Clear text before or after the cursor. |
| Ctrl+W | Delete the previous word. |
| Ctrl+D | End the session (EOF). |
Examples
Quick Endpoint Smoke Test
xeroctl chat --endpoint my-endpoint --message "What is 2+2?"
Deterministic Output With a Seed
xeroctl chat --endpoint my-endpoint \
--message "Pick a random number between 1 and 10" \
--seed 42 \
--temperature 0.0
Stream a Long Generation
xeroctl chat --endpoint my-endpoint \
--message "Write a 500-word essay on renewable energy" \
--max-tokens 700 \
--stream
Batch-Test Multiple Endpoints
#!/bin/bash
for ep in endpoint-prod endpoint-staging endpoint-dev; do
echo "=== $ep ==="
xeroctl chat --endpoint "$ep" \
--message "Respond with the word OK only." \
--show-usage
done
Use a Persona File for Interactive Sessions
echo "You are a terse senior engineer. Answer only in bullet points." > persona.txt
xeroctl chat --endpoint my-endpoint --interactive --from-file persona.txt