TokenRouter API

An OpenAI-compatible gateway to frontier open models. One key, one base URL — works with Codex CLI, OpenCode, the OpenAI SDKs and plain HTTP.

Quick Start

  1. Create an API key at My Account → API Keys.
  2. Point your client at https://tokenrouter.me/v1 with your key as a Bearer token.
  3. Pick a model from the table below — model IDs are exact and case-sensitive.
curl https://tokenrouter.me/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "kimi-k2p6", "messages": [{"role": "user", "content": "Hello!"}]}'

Base URL & Authentication

ParameterValue
Base URLhttps://tokenrouter.me/v1
Auth headerAuthorization: Bearer YOUR_KEY
Content typeContent-Type: application/json
Endpoints/v1/models · /v1/chat/completions · /v1/responses
StreamingSSE — pass "stream": true
Both the Chat Completions API and the Responses API are supported, so tools that require either one (e.g. Codex CLI) work out of the box.

Available Models

Use the exact model ID from the first column. Limits below are the real upstream limits — keep your prompt + completion within the context window.

Model IDNameContextMax output*ToolsVisionBest for
kimi-k2p6Kimi K2.6256K (262,144)65,536Agentic coding, long-horizon tasks
kimi-k2p5Kimi K2.5256K (262,144)65,536Fast agentic coding
deepseek-v4-proDeepSeek V4 Pro1M65,536Deep reasoning, huge codebases
deepseek-v4-flashDeepSeek V4 Flash1M65,536Fast & cheapest, long context
qwen3p6-plusQwen 3.6 Plus256K (262,144)65,536Multilingual, general purpose
glm-5p1GLM 5.1200K (202,752)65,536Agentic engineering
gpt-oss-120bGPT-OSS 120B128K (131,072)32,768Open-source OpenAI model
minimax-m2p7MiniMax M2.7196K (196,608)65,536Agent harnesses, budget tasks

* Recommended max_tokens ceiling for a single completion. The gateway clamps oversized values automatically.

Model IDs are case-sensitive. Use kimi-k2p6, not Kimi K2.6 or kimi-2.6. List models available to your key: GET /v1/models.

Codex CLI wire_api: responses

TokenRouter supports the Responses API, which Codex CLI uses natively.

1. Set your API key

# Windows (PowerShell) — then restart your terminal
setx TOKENROUTER_API_KEY "sk-YOUR_KEY"

# Linux / macOS — add to ~/.bashrc or ~/.zshrc
export TOKENROUTER_API_KEY="sk-YOUR_KEY"

2. Configure ~/.codex/config.toml

model_provider = "tokenrouter"
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536
model_reasoning_effort = "high"
disable_response_storage = true

[model_providers.tokenrouter]
name = "TokenRouter"
base_url = "https://tokenrouter.me/v1"
env_key = "TOKENROUTER_API_KEY"
wire_api = "responses"

3. Optional: a profile per model

Add profiles and switch with codex --profile deepseek:

[profiles.kimi]
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.kimi-fast]
model = "kimi-k2p5"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.deepseek]
model = "deepseek-v4-pro"
model_context_window = 1000000
model_max_output_tokens = 65536

[profiles.deepseek-flash]
model = "deepseek-v4-flash"
model_context_window = 1000000
model_max_output_tokens = 65536

[profiles.qwen]
model = "qwen3p6-plus"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.glm]
model = "glm-5p1"
model_context_window = 202752
model_max_output_tokens = 65536

[profiles.gpt-oss]
model = "gpt-oss-120b"
model_context_window = 131072
model_max_output_tokens = 32768

[profiles.minimax]
model = "minimax-m2p7"
model_context_window = 196608
model_max_output_tokens = 65536
Alternative auth: instead of env_key, you can set requires_openai_auth = true in the provider block and put {"OPENAI_API_KEY": "sk-YOUR_KEY"} into ~/.codex/auth.json. The env_key method above is recommended.

OpenCode

Config file: ~/.config/opencode/opencode.json (Windows: %USERPROFILE%\.config\opencode\opencode.json). Create it if it doesn't exist.

Set TOKENROUTER_API_KEY in your environment (see step 1 above), or replace {env:TOKENROUTER_API_KEY} with the key itself.

{
  "$schema": "https://opencode.ai/config.json",
  "model": "tokenrouter/kimi-k2p6",
  "provider": {
    "tokenrouter": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "TokenRouter",
      "options": {
        "baseURL": "https://tokenrouter.me/v1",
        "apiKey": "{env:TOKENROUTER_API_KEY}"
      },
      "models": {
        "kimi-k2p6": {
          "name": "Kimi K2.6 (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "kimi-k2p5": {
          "name": "Kimi K2.5 (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "deepseek-v4-pro": {
          "name": "DeepSeek V4 Pro (1M)",
          "limit": { "context": 1000000, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "deepseek-v4-flash": {
          "name": "DeepSeek V4 Flash (1M)",
          "limit": { "context": 1000000, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "qwen3p6-plus": {
          "name": "Qwen 3.6 Plus (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "glm-5p1": {
          "name": "GLM 5.1 (200K)",
          "limit": { "context": 202752, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "gpt-oss-120b": {
          "name": "GPT-OSS 120B (128K)",
          "limit": { "context": 131072, "output": 32768 },
          "tool_call": true, "reasoning": true
        },
        "minimax-m2p7": {
          "name": "MiniMax M2.7 (196K)",
          "limit": { "context": 196608, "output": 65536 },
          "tool_call": true, "reasoning": true
        }
      }
    }
  }
}

Switch models inside OpenCode with the /models command. The model key (e.g. kimi-k2p6) is what gets sent to the API; name is display-only.

cURL

Linux / macOS / Git Bash

export TOKENROUTER_API_KEY="sk-YOUR_KEY"

curl https://tokenrouter.me/v1/chat/completions \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2p6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true
  }'

Windows PowerShell

Note: use curl.exe (plain curl is an alias for Invoke-WebRequest).

$env:TOKENROUTER_API_KEY = "sk-YOUR_KEY"

curl.exe https://tokenrouter.me/v1/chat/completions `
  -H "Authorization: Bearer $env:TOKENROUTER_API_KEY" `
  -H "Content-Type: application/json" `
  -d '{\"model\":\"kimi-k2p6\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello!\"}]}'

Python

pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://tokenrouter.me/v1",
    api_key="sk-YOUR_KEY",
)

# Non-streaming
response = client.chat.completions.create(
    model="kimi-k2p6",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in / {response.usage.completion_tokens} out")

# Streaming
stream = client.chat.completions.create(
    model="kimi-k2p6",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
Always check chunk.choices is non-empty before indexing — some streaming chunks carry only metadata.

Node.js / TypeScript

npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://tokenrouter.me/v1',
  apiKey: 'sk-YOUR_KEY',
});

const stream = await client.chat.completions.create({
  model: 'kimi-k2p6',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Responses API

POST /v1/responses implements the OpenAI Responses API, including SSE streaming and reasoning.effort. This is what Codex CLI uses under the hood.

curl https://tokenrouter.me/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2p6",
    "input": "Hello!",
    "reasoning": {"effort": "high"},
    "stream": true
  }'

Streamed events follow the standard sequence: response.createdresponse.reasoning_summary_text.deltaresponse.output_text.deltaresponse.completed.

Supported Parameters

ParameterNotes
modelRequired. Exact ID from the models table.
messagesRequired (Chat Completions). Array of {role, content}.
streamtrue for SSE streaming. Default false.
max_tokensCompletion cap. Values above the model limit are clamped.
temperature, top_pStandard sampling controls.
tools, tool_choiceFunction calling — supported by all listed models.
response_formatJSON mode, where the upstream model supports it.
reasoningResponses API: {"effort": "low" | "medium" | "high"}.

Response Format

Responses follow the OpenAI schema. Two TokenRouter-specific details:

1. Upstream model names. The model field in responses contains the full upstream path (e.g. accounts/fireworks/models/kimi-k2p6) even though you send the short ID. This is normal.

2. Reasoning content. Reasoning models return their thinking in message.reasoning_content (or delta.reasoning_content chunks when streaming). The final answer is always in content — read that field for the actual reply.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "accounts/fireworks/models/kimi-k2p6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "OK",
      "reasoning_content": "The user asked me to..."
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 6, "completion_tokens": 28, "total_tokens": 34 }
}

Troubleshooting

SymptomFix
401 INVALID_API_KEYWrong or missing key. Check the Authorization: Bearer header and your key on the API Keys page.
Model not foundModel IDs are exact & case-sensitive. Verify with GET /v1/models.
max_tokens ... cannot be greater than max_model_lenPrompt + max_tokens exceeds the context window. Lower max_tokens or shorten the prompt.
Empty crash on streamingSome chunks have an empty choices array — guard before indexing.
cURL fails on WindowsUse curl.exe in PowerShell and escape JSON quotes as \".
Codex: "unexpected status 404"Make sure base_url is https://tokenrouter.me/v1 and wire_api = "responses".

Support

Check your API Keys and Available Channels pages first — channels show which models your key can access. Billing is per token; your balance is shown in the dashboard header.