TokenRouter API

An OpenAI-compatible gateway to frontier open models. One key, one base URL — works with Codex CLI, OpenCode, the OpenAI SDKs and plain HTTP.

Quick Start

Create an API key at My Account → API Keys.
Point your client at https://tokenrouter.me/v1 with your key as a Bearer token.
Pick a model from the table below — model IDs are exact and case-sensitive.

curl https://tokenrouter.me/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "kimi-k2p6", "messages": [{"role": "user", "content": "Hello!"}]}'

Base URL & Authentication

Parameter	Value
Base URL	`https://tokenrouter.me/v1`
Auth header	`Authorization: Bearer YOUR_KEY`
Content type	`Content-Type: application/json`
Endpoints	`/v1/models` · `/v1/chat/completions` · `/v1/responses`
Streaming	SSE — pass `"stream": true`

Both the Chat Completions API and the Responses API are supported, so tools that require either one (e.g. Codex CLI) work out of the box.

Available Models

Use the exact model ID from the first column. Limits below are the real upstream limits — keep your prompt + completion within the context window.

Model ID	Name	Context	Max output*	Tools	Vision	Best for
`kimi-k2p7-code`	Kimi 2.7 Code	256K (262,144)	65,536	✓	✓	Agentic coding, code generation
`kimi-k2p7-code-fast`	Kimi K2.7 Code Fast	256K (262,144)	65,536	✓	✓	Fast agentic coding, code generation
`kimi-k2p6`	Kimi K2.6	256K (262,144)	65,536	✓	✓	Agentic coding, long-horizon tasks
`kimi-k2p5`	Kimi K2.5	256K (262,144)	65,536	✓	✓	Fast agentic coding
`deepseek-v4-pro`	DeepSeek V4 Pro	1M	65,536	✓	—	Deep reasoning, huge codebases
`deepseek-v4-flash`	DeepSeek V4 Flash	1M	65,536	✓	—	Fast & cheapest, long context
`qwen3p7-plus`	Qwen 3.7 Plus	256K (262,144)	65,536	✓	✓	Multilingual, general purpose
`qwen3p6-plus`	Qwen 3.6 Plus	256K (262,144)	65,536	✓	✓	Multilingual, general purpose
`glm-5p1`	GLM 5.1	200K (202,752)	65,536	✓	—	Agentic engineering
`glm-5p1-fast`	GLM 5.1 Fast	200K (202,752)	65,536	✓	—	Fast agentic engineering
`gpt-oss-120b`	GPT-OSS 120B	128K (131,072)	32,768	✓	—	Open-source OpenAI model
`minimax-m3`	MiniMax M3	512K (524,288)	65,536	✓	✓	Long context, agent harnesses
`minimax-m2p7`	MiniMax M2.7	196K (196,608)	65,536	✓	—	Agent harnesses, budget tasks

* Recommended max_tokens ceiling for a single completion. The gateway clamps oversized values automatically.

Model IDs are case-sensitive. Use kimi-k2p6, not Kimi K2.6 or kimi-2.6. List models available to your key: GET /v1/models.

Codex CLI wire_api: responses

TokenRouter supports the Responses API, which Codex CLI uses natively.

1. Set your API key

# Windows (PowerShell) — then restart your terminal
setx TOKENROUTER_API_KEY "sk-YOUR_KEY"

# Linux / macOS — add to ~/.bashrc or ~/.zshrc
export TOKENROUTER_API_KEY="sk-YOUR_KEY"

2. Configure `~/.codex/config.toml`

model_provider = "tokenrouter"
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536
model_reasoning_effort = "high"
disable_response_storage = true

[model_providers.tokenrouter]
name = "TokenRouter"
base_url = "https://tokenrouter.me/v1"
env_key = "TOKENROUTER_API_KEY"
wire_api = "responses"

3. Optional: a profile per model

Add profiles and switch with codex --profile deepseek:

[profiles.kimi-code]
model = "kimi-k2p7-code"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.kimi-code-fast]
model = "kimi-k2p7-code-fast"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.kimi]
model = "kimi-k2p6"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.kimi-fast]
model = "kimi-k2p5"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.deepseek]
model = "deepseek-v4-pro"
model_context_window = 1000000
model_max_output_tokens = 65536

[profiles.deepseek-flash]
model = "deepseek-v4-flash"
model_context_window = 1000000
model_max_output_tokens = 65536

[profiles.qwen-plus]
model = "qwen3p7-plus"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.qwen]
model = "qwen3p6-plus"
model_context_window = 262144
model_max_output_tokens = 65536

[profiles.glm]
model = "glm-5p1"
model_context_window = 202752
model_max_output_tokens = 65536

[profiles.glm-fast]
model = "glm-5p1-fast"
model_context_window = 202752
model_max_output_tokens = 65536

[profiles.gpt-oss]
model = "gpt-oss-120b"
model_context_window = 131072
model_max_output_tokens = 32768

[profiles.minimax-m3]
model = "minimax-m3"
model_context_window = 524288
model_max_output_tokens = 65536

[profiles.minimax]
model = "minimax-m2p7"
model_context_window = 196608
model_max_output_tokens = 65536

Alternative auth: instead of env_key, you can set requires_openai_auth = true in the provider block and put {"OPENAI_API_KEY": "sk-YOUR_KEY"} into ~/.codex/auth.json. The env_key method above is recommended.

OpenCode

Config file: ~/.config/opencode/opencode.json (Windows: %USERPROFILE%\.config\opencode\opencode.json). Create it if it doesn't exist.

Set TOKENROUTER_API_KEY in your environment (see step 1 above), or replace {env:TOKENROUTER_API_KEY} with the key itself.

{
  "$schema": "https://opencode.ai/config.json",
  "model": "tokenrouter/kimi-k2p6",
  "provider": {
    "tokenrouter": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "TokenRouter",
      "options": {
        "baseURL": "https://tokenrouter.me/v1",
        "apiKey": "{env:TOKENROUTER_API_KEY}"
      },
      "models": {
        "kimi-k2p7-code": {
          "name": "Kimi 2.7 Code (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "kimi-k2p7-code-fast": {
          "name": "Kimi K2.7 Code Fast (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "kimi-k2p6": {
          "name": "Kimi K2.6 (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "kimi-k2p5": {
          "name": "Kimi K2.5 (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "deepseek-v4-pro": {
          "name": "DeepSeek V4 Pro (1M)",
          "limit": { "context": 1000000, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "deepseek-v4-flash": {
          "name": "DeepSeek V4 Flash (1M)",
          "limit": { "context": 1000000, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "qwen3p7-plus": {
          "name": "Qwen 3.7 Plus (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "qwen3p6-plus": {
          "name": "Qwen 3.6 Plus (256K)",
          "limit": { "context": 262144, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "glm-5p1": {
          "name": "GLM 5.1 (200K)",
          "limit": { "context": 202752, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "glm-5p1-fast": {
          "name": "GLM 5.1 Fast (200K)",
          "limit": { "context": 202752, "output": 65536 },
          "tool_call": true, "reasoning": true
        },
        "gpt-oss-120b": {
          "name": "GPT-OSS 120B (128K)",
          "limit": { "context": 131072, "output": 32768 },
          "tool_call": true, "reasoning": true
        },
        "minimax-m3": {
          "name": "MiniMax M3 (512K)",
          "limit": { "context": 524288, "output": 65536 },
          "tool_call": true, "reasoning": true, "attachment": true
        },
        "minimax-m2p7": {
          "name": "MiniMax M2.7 (196K)",
          "limit": { "context": 196608, "output": 65536 },
          "tool_call": true, "reasoning": true
        }
      }
    }
  }
}

Switch models inside OpenCode with the /models command. The model key (e.g. kimi-k2p6) is what gets sent to the API; name is display-only.

cURL

Linux / macOS / Git Bash

export TOKENROUTER_API_KEY="sk-YOUR_KEY"

curl https://tokenrouter.me/v1/chat/completions \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2p6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true
  }'

Windows PowerShell

Note: use curl.exe (plain curl is an alias for Invoke-WebRequest).

$env:TOKENROUTER_API_KEY = "sk-YOUR_KEY"

curl.exe https://tokenrouter.me/v1/chat/completions `
  -H "Authorization: Bearer $env:TOKENROUTER_API_KEY" `
  -H "Content-Type: application/json" `
  -d '{\"model\":\"kimi-k2p6\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello!\"}]}'

Python

pip install openai

from openai import OpenAI

client = OpenAI(
    base_url="https://tokenrouter.me/v1",
    api_key="sk-YOUR_KEY",
)

# Non-streaming
response = client.chat.completions.create(
    model="kimi-k2p6",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in / {response.usage.completion_tokens} out")

# Streaming
stream = client.chat.completions.create(
    model="kimi-k2p6",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Always check chunk.choices is non-empty before indexing — some streaming chunks carry only metadata.

Node.js / TypeScript

npm install openai

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://tokenrouter.me/v1',
  apiKey: 'sk-YOUR_KEY',
});

const stream = await client.chat.completions.create({
  model: 'kimi-k2p6',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Responses API

POST /v1/responses implements the OpenAI Responses API, including SSE streaming and reasoning.effort. This is what Codex CLI uses under the hood.

curl https://tokenrouter.me/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2p6",
    "input": "Hello!",
    "reasoning": {"effort": "high"},
    "stream": true
  }'

Streamed events follow the standard sequence: response.created → response.reasoning_summary_text.delta → response.output_text.delta → response.completed.

Supported Parameters

Parameter	Notes
`model`	Required. Exact ID from the models table.
`messages`	Required (Chat Completions). Array of `{role, content}`.
`stream`	`true` for SSE streaming. Default `false`.
`max_tokens`	Completion cap. Values above the model limit are clamped.
`temperature`, `top_p`	Standard sampling controls.
`tools`, `tool_choice`	Function calling — supported by all listed models.
`response_format`	JSON mode, where the upstream model supports it.
`reasoning`	Responses API: `{"effort": "low" \| "medium" \| "high"}`.

Response Format

Responses follow the OpenAI schema. Two TokenRouter-specific details:

1. Model field. The model field in responses may contain a fully-qualified provider path rather than the short ID you sent (e.g. kimi-k2p6). This is normal — match on the short ID you requested rather than parsing this field.

2. Reasoning content. Reasoning models return their thinking in message.reasoning_content (or delta.reasoning_content chunks when streaming). The final answer is always in content — read that field for the actual reply.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "kimi-k2p6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "OK",
      "reasoning_content": "The user asked me to..."
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 6, "completion_tokens": 28, "total_tokens": 34 }
}

Troubleshooting

Symptom	Fix
`401 INVALID_API_KEY`	Wrong or missing key. Check the `Authorization: Bearer` header and your key on the API Keys page.
Model not found	Model IDs are exact & case-sensitive. Verify with `GET /v1/models`.
`max_tokens ... cannot be greater than max_model_len`	Prompt + `max_tokens` exceeds the context window. Lower `max_tokens` or shorten the prompt.
Empty crash on streaming	Some chunks have an empty `choices` array — guard before indexing.
cURL fails on Windows	Use `curl.exe` in PowerShell and escape JSON quotes as `\"`.
Codex: "unexpected status 404"	Make sure `base_url` is `https://tokenrouter.me/v1` and `wire_api = "responses"`.

Support

Check your API Keys and Available Channels pages first — channels show which models your key can access. Billing is per token; your balance is shown in the dashboard header.

Stay in touch on Telegram:

Channel: @tokenrouter_me — news & updates
Support: @tokenrouter_support (Rus / Eng)