Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Overview
1. Why You Need a Gateway Layer
2. Core Concepts
3. Getting a Key and Your First curl Request
4. OpenAI SDK Drop-In
- Python
- TypeScript
5. Choosing Models and Routing
6. Streaming / Tool Calling / Structured Output
7. Attribution Headers and Checking Usage
8. Practical Tips and Caveats
References

Overview

Once you start working with more than one LLM, you hit the same problem quickly. You want to mix models from OpenAI, Anthropic, Google, and Meta, but each provider ships its own SDK, its own authentication, its own billing, and its own request schema. Swapping a model means rewriting client code, and if one provider goes down, your service goes down with it.

OpenRouter cleans this up. A single endpoint and a single API key give you access to 300+ (up to roughly 500) models from 60+ providers, and the interface is fully compatible with the OpenAI API. So if you are already using the OpenAI SDK, you can call Claude, Gemini, or Llama just by changing base_url and api_key.

This guide covers why you would use OpenRouter as a gateway, what the core concepts are, and how to actually wire it up — step by step from getting a key through routing, fallbacks, and streaming.

1. Why You Need a Gateway Layer

An LLM application looks fine with a single model at first. But once it goes to production, common needs appear.

Different tasks want different models. You want cheap models for summarization and frontier models for hard reasoning.
You do not want to be locked into one provider. If prices rise or an outage hits, you need to move to another model.
You want billing and usage in one place. Opening a separate dashboard per provider is inefficient.

A gateway pulls this out of your application code. The client always talks to the same endpoint, and switching models becomes a matter of swapping a single string (the model field). Provider routing, fallbacks, and billing aggregation are handled by the gateway.

Direct-call approach	OpenRouter gateway
Different SDK/auth/schema per provider	One endpoint, one key, OpenAI-compatible schema
Code changes to swap a model	Swap only the `model` string
Separate billing account per provider	Unified billing on one credit balance
Roll your own fallback on failure	Automatic fallback via `models` array / `provider` routing

2. Core Concepts

One Endpoint

Every call goes to a single base URL.

https://openrouter.ai/api/v1

Chat completions are requested at https://openrouter.ai/api/v1/chat/completions. Authentication uses a standard Bearer token header.

Authorization: Bearer YOUR_OPENROUTER_API_KEY

Model IDs

Models are specified as strings in the provider/model form. Common examples:

Model ID	Description
`openai/gpt-4o`	OpenAI GPT-4o
`anthropic/claude-3.5-sonnet`	Anthropic Claude 3.5 Sonnet
`google/gemini-2.0-flash-exp`	Google Gemini 2.0 Flash (experimental)
`meta-llama/llama-3.3-70b-instruct`	Meta Llama 3.3 70B Instruct
`openrouter/auto`	Auto-router — OpenRouter picks a model for the request

Free variants end in :free. The full catalog with per-model pricing and context length is on the models page, and programmatically you can query the /api/v1/models endpoint.

Credits and Free Tier

The free tier gives you roughly 50 free requests per day and 25+ free models (whose IDs end in :free). Paid usage is credit-based and pay-as-you-go, billed per token of the model that actually runs. When you top up credits, there is a small fee (about 5.5%), but the cost of a call itself follows the per-token price of the model that actually served the request.

3. Getting a Key and Your First curl Request

First, create an API key at openrouter.ai/keys. It is convenient to put it in an environment variable.

export OPENROUTER_API_KEY="sk-or-your-key-here"

Now send the simplest possible request with curl.

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-4o", "messages": [{"role":"user","content":"Hello"}] }'

The response comes back in the same shape as OpenAI's chat completions. To switch models, change only the model value to anthropic/claude-3.5-sonnet or google/gemini-2.0-flash-exp. The rest of the request stays the same.

4. OpenAI SDK Drop-In

The practical meaning of "OpenRouter is OpenAI-compatible" is that you keep using the official OpenAI SDK and just point it at OpenRouter.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
)

completion = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[
        {"role": "user", "content": "Explain in one sentence why a gateway is useful"},
    ],
)

print(completion.choices[0].message.content)

In the OpenAI(...) constructor, you only change base_url and api_key to the OpenRouter values; after that, calling client.chat.completions.create(...) works exactly as usual. Read the response body from completion.choices[0].message.content.

TypeScript

TypeScript is the same idea. Give the openai npm client a baseURL and an apiKey.

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Hello' }],
});

console.log(completion.choices[0].message.content);

You can also skip the SDK and POST with plain fetch. As long as the endpoint and headers match, it behaves identically.

const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'openai/gpt-4o',
    messages: [{ role: 'user', content: 'Hello' }],
  }),
});

const data = await res.json();
console.log(data.choices[0].message.content);

5. Choosing Models and Routing

Provider Routing Override

Some models are offered by several providers. In that case OpenRouter auto-picks a provider based on price, latency, and availability. To force a specific policy, add a provider field to the request to override it. For example, you can specify provider priority (order), whether fallbacks are allowed (allow_fallbacks), and the sort criterion (sort).

{
  "model": "meta-llama/llama-3.3-70b-instruct",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "sort": "throughput",
    "allow_fallbacks": true
  }
}

Model-Level Fallback — the `models` Array

Separately from provider routing, you can specify a fallback that tries several models in order. Pass a models array and it tries them from the front.

{
  "models": [
    "openai/gpt-4o",
    "anthropic/claude-3.5-sonnet",
    "meta-llama/llama-3.3-70b-instruct"
  ],
  "messages": [{ "role": "user", "content": "Hello" }]
}

Fallbacks trigger on rate limits, downtime, context-length overflow, or moderation errors. The important part is billing. Even if you list several models, you are billed only for the tokens of the model that actually ran and produced the response. Models that were tried and failed cost nothing.

Auto-Router — `openrouter/auto`

When picking a model yourself is awkward, set the model value to openrouter/auto. OpenRouter looks at the request and routes it to a suitable model. This is useful for early prototyping or whenever you just need "a model that works."

6. Streaming / Tool Calling / Structured Output

All three of these features work the same way as in the OpenAI API.

Streaming

Pass stream=True (or stream: true in TypeScript) to receive the response in chunks.

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")

Tool / Function Calling

Tool (function) calling uses the OpenAI schema as-is. Put function definitions in a tools array, handle the tool_calls the model returns, then feed the results back as messages.

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "What is the weather in Seoul?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Return the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Structured Output

JSON-schema-based structured output is also supported. Set json_schema in response_format to force the response to conform to that schema.

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Make one sample user" }],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "user",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer" }
        },
        "required": ["name", "age"]
      }
    }
  }
}

7. Attribution Headers and Checking Usage

OpenRouter supports optional attribution headers for the leaderboards on openrouter.ai (which apps use which models most). They are not required; adding them lets your app show up on the leaderboards.

HTTP-Referer — your site URL
X-Title — your app name

In the OpenAI SDK you pass these as default headers. Python uses default_headers, JavaScript uses defaultHeaders.

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
    default_headers={
        "HTTP-Referer": "https://your-site.example",
        "X-Title": "My App",
    },
)

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': 'https://your-site.example',
    'X-Title': 'My App',
  },
});

You can list available models and their metadata via the /api/v1/models endpoint.

curl https://openrouter.ai/api/v1/models \
  -H "Authorization: Bearer $OPENROUTER_API_KEY"

8. Practical Tips and Caveats

Cost always follows the per-token price of the underlying model that actually ran. When you use openrouter/auto or a models fallback, remember that the price depends on which model ends up running.
:free models are free but have low rate limits. If you lean on free models for production traffic, you will hit the limits quickly. It is safer to use them for prototyping or low-load work.
Keep your API key server-side. If you expose the key in a browser bundle or client code, it will be stolen outright. If you must call from the frontend, put your own backend in front as a proxy and call OpenRouter behind it.
Because the response schema stays OpenAI-compatible across models, you do not need to rewrite parsing code per model. That said, support for tool calling or structured output can differ by model, so when you switch to a new model, verify that the feature actually works.
Fallback arrays improve reliability, but putting an expensive model at the tail can spike your cost during an outage. Order your fallbacks with unit price in mind, not just availability.