TokenRouter API
An OpenAI-compatible gateway to frontier open models. One key, one base URL — works with Codex CLI, OpenCode, the OpenAI SDKs and plain HTTP.
Quick Start
- Create an API key at My Account → API Keys.
- Point your client at
https://tokenrouter.me/v1with your key as a Bearer token. - Pick a model from the table below — model IDs are exact and case-sensitive.
curl https://tokenrouter.me/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "kimi-k2p6", "messages": [{"role": "user", "content": "Hello!"}]}'
Base URL & Authentication
| Parameter | Value |
|---|---|
| Base URL | https://tokenrouter.me/v1 |
| Auth header | Authorization: Bearer YOUR_KEY |
| Content type | Content-Type: application/json |
| Endpoints | /v1/models · /v1/chat/completions · /v1/responses |
| Streaming | SSE — pass "stream": true |
Available Models
Use the exact model ID from the first column. Limits below are the real upstream limits — keep your prompt + completion within the context window.
| Model ID | Name | Context | Max output* | Tools | Vision | Best for |
|---|---|---|---|---|---|---|
kimi-k2p6 | Kimi K2.6 | 256K (262,144) | 65,536 | ✓ | ✓ | Agentic coding, long-horizon tasks |
kimi-k2p5 | Kimi K2.5 | 256K (262,144) | 65,536 | ✓ | ✓ | Fast agentic coding |
deepseek-v4-pro | DeepSeek V4 Pro | 1M | 65,536 | ✓ | — | Deep reasoning, huge codebases |
deepseek-v4-flash | DeepSeek V4 Flash | 1M | 65,536 | ✓ | — | Fast & cheapest, long context |
qwen3p6-plus | Qwen 3.6 Plus | 256K (262,144) | 65,536 | ✓ | ✓ | Multilingual, general purpose |
glm-5p1 | GLM 5.1 | 200K (202,752) | 65,536 | ✓ | — | Agentic engineering |
gpt-oss-120b | GPT-OSS 120B | 128K (131,072) | 32,768 | ✓ | — | Open-source OpenAI model |
minimax-m2p7 | MiniMax M2.7 | 196K (196,608) | 65,536 | ✓ | — | Agent harnesses, budget tasks |
* Recommended max_tokens ceiling for a single completion. The gateway clamps oversized values automatically.
kimi-k2p6, not Kimi K2.6 or kimi-2.6. List models available to your key: GET /v1/models.Codex CLI wire_api: responses
TokenRouter supports the Responses API, which Codex CLI uses natively.
1. Set your API key
# Windows (PowerShell) — then restart your terminal
setx TOKENROUTER_API_KEY "sk-YOUR_KEY"
# Linux / macOS — add to ~/.bashrc or ~/.zshrc
export TOKENROUTER_API_KEY="sk-YOUR_KEY"
2. Configure ~/.codex/config.toml
model_provider = "tokenrouter"
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536
model_reasoning_effort = "high"
disable_response_storage = true
[model_providers.tokenrouter]
name = "TokenRouter"
base_url = "https://tokenrouter.me/v1"
env_key = "TOKENROUTER_API_KEY"
wire_api = "responses"
3. Optional: a profile per model
Add profiles and switch with codex --profile deepseek:
[profiles.kimi]
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536
[profiles.kimi-fast]
model = "kimi-k2p5"
model_context_window = 262144
model_max_output_tokens = 65536
[profiles.deepseek]
model = "deepseek-v4-pro"
model_context_window = 1000000
model_max_output_tokens = 65536
[profiles.deepseek-flash]
model = "deepseek-v4-flash"
model_context_window = 1000000
model_max_output_tokens = 65536
[profiles.qwen]
model = "qwen3p6-plus"
model_context_window = 262144
model_max_output_tokens = 65536
[profiles.glm]
model = "glm-5p1"
model_context_window = 202752
model_max_output_tokens = 65536
[profiles.gpt-oss]
model = "gpt-oss-120b"
model_context_window = 131072
model_max_output_tokens = 32768
[profiles.minimax]
model = "minimax-m2p7"
model_context_window = 196608
model_max_output_tokens = 65536
env_key, you can set requires_openai_auth = true in the provider block and put {"OPENAI_API_KEY": "sk-YOUR_KEY"} into ~/.codex/auth.json. The env_key method above is recommended.OpenCode
Config file: ~/.config/opencode/opencode.json (Windows: %USERPROFILE%\.config\opencode\opencode.json). Create it if it doesn't exist.
Set TOKENROUTER_API_KEY in your environment (see step 1 above), or replace {env:TOKENROUTER_API_KEY} with the key itself.
{
"$schema": "https://opencode.ai/config.json",
"model": "tokenrouter/kimi-k2p6",
"provider": {
"tokenrouter": {
"npm": "@ai-sdk/openai-compatible",
"name": "TokenRouter",
"options": {
"baseURL": "https://tokenrouter.me/v1",
"apiKey": "{env:TOKENROUTER_API_KEY}"
},
"models": {
"kimi-k2p6": {
"name": "Kimi K2.6 (256K)",
"limit": { "context": 262144, "output": 65536 },
"tool_call": true, "reasoning": true, "attachment": true
},
"kimi-k2p5": {
"name": "Kimi K2.5 (256K)",
"limit": { "context": 262144, "output": 65536 },
"tool_call": true, "reasoning": true, "attachment": true
},
"deepseek-v4-pro": {
"name": "DeepSeek V4 Pro (1M)",
"limit": { "context": 1000000, "output": 65536 },
"tool_call": true, "reasoning": true
},
"deepseek-v4-flash": {
"name": "DeepSeek V4 Flash (1M)",
"limit": { "context": 1000000, "output": 65536 },
"tool_call": true, "reasoning": true
},
"qwen3p6-plus": {
"name": "Qwen 3.6 Plus (256K)",
"limit": { "context": 262144, "output": 65536 },
"tool_call": true, "reasoning": true, "attachment": true
},
"glm-5p1": {
"name": "GLM 5.1 (200K)",
"limit": { "context": 202752, "output": 65536 },
"tool_call": true, "reasoning": true
},
"gpt-oss-120b": {
"name": "GPT-OSS 120B (128K)",
"limit": { "context": 131072, "output": 32768 },
"tool_call": true, "reasoning": true
},
"minimax-m2p7": {
"name": "MiniMax M2.7 (196K)",
"limit": { "context": 196608, "output": 65536 },
"tool_call": true, "reasoning": true
}
}
}
}
}
Switch models inside OpenCode with the /models command. The model key (e.g. kimi-k2p6) is what gets sent to the API; name is display-only.
cURL
Linux / macOS / Git Bash
export TOKENROUTER_API_KEY="sk-YOUR_KEY"
curl https://tokenrouter.me/v1/chat/completions \
-H "Authorization: Bearer $TOKENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2p6",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": true
}'
Windows PowerShell
Note: use curl.exe (plain curl is an alias for Invoke-WebRequest).
$env:TOKENROUTER_API_KEY = "sk-YOUR_KEY"
curl.exe https://tokenrouter.me/v1/chat/completions `
-H "Authorization: Bearer $env:TOKENROUTER_API_KEY" `
-H "Content-Type: application/json" `
-d '{\"model\":\"kimi-k2p6\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello!\"}]}'
Python
pip install openai
from openai import OpenAI
client = OpenAI(
base_url="https://tokenrouter.me/v1",
api_key="sk-YOUR_KEY",
)
# Non-streaming
response = client.chat.completions.create(
model="kimi-k2p6",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in / {response.usage.completion_tokens} out")
# Streaming
stream = client.chat.completions.create(
model="kimi-k2p6",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
chunk.choices is non-empty before indexing — some streaming chunks carry only metadata.Node.js / TypeScript
npm install openai
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://tokenrouter.me/v1',
apiKey: 'sk-YOUR_KEY',
});
const stream = await client.chat.completions.create({
model: 'kimi-k2p6',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Responses API
POST /v1/responses implements the OpenAI Responses API, including SSE streaming and reasoning.effort. This is what Codex CLI uses under the hood.
curl https://tokenrouter.me/v1/responses \
-H "Authorization: Bearer $TOKENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2p6",
"input": "Hello!",
"reasoning": {"effort": "high"},
"stream": true
}'
Streamed events follow the standard sequence: response.created → response.reasoning_summary_text.delta → response.output_text.delta → response.completed.
Supported Parameters
| Parameter | Notes |
|---|---|
model | Required. Exact ID from the models table. |
messages | Required (Chat Completions). Array of {role, content}. |
stream | true for SSE streaming. Default false. |
max_tokens | Completion cap. Values above the model limit are clamped. |
temperature, top_p | Standard sampling controls. |
tools, tool_choice | Function calling — supported by all listed models. |
response_format | JSON mode, where the upstream model supports it. |
reasoning | Responses API: {"effort": "low" | "medium" | "high"}. |
Response Format
Responses follow the OpenAI schema. Two TokenRouter-specific details:
1. Upstream model names. The model field in responses contains the full upstream path (e.g. accounts/fireworks/models/kimi-k2p6) even though you send the short ID. This is normal.
2. Reasoning content. Reasoning models return their thinking in message.reasoning_content (or delta.reasoning_content chunks when streaming). The final answer is always in content — read that field for the actual reply.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "accounts/fireworks/models/kimi-k2p6",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "OK",
"reasoning_content": "The user asked me to..."
},
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 6, "completion_tokens": 28, "total_tokens": 34 }
}
Troubleshooting
| Symptom | Fix |
|---|---|
401 INVALID_API_KEY | Wrong or missing key. Check the Authorization: Bearer header and your key on the API Keys page. |
| Model not found | Model IDs are exact & case-sensitive. Verify with GET /v1/models. |
max_tokens ... cannot be greater than max_model_len | Prompt + max_tokens exceeds the context window. Lower max_tokens or shorten the prompt. |
| Empty crash on streaming | Some chunks have an empty choices array — guard before indexing. |
| cURL fails on Windows | Use curl.exe in PowerShell and escape JSON quotes as \". |
| Codex: "unexpected status 404" | Make sure base_url is https://tokenrouter.me/v1 and wire_api = "responses". |
Support
Check your API Keys and Available Channels pages first — channels show which models your key can access. Billing is per token; your balance is shown in the dashboard header.