AI EngineerGuide

Response Caching in OpenRouter

by Ashik Nesin Ashik Nesin

OpenRouter now lets you cache the responses across all the models.

Just add X-OpenRouter-Cache: true and it’ll cache chat completions, responses, messages, or embeddings requests.

First call hits the AI provider and other subsequent requests will return the same response and you won’t get billed for that.

If you’re using LLM in test env, local development and other similar environment. You can just enable this feature and so you don’t ended up paying for those use cases.

How to use it?

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-OpenRouter-Cache: true" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Why sky is blue?"}]
  }'

Reference

Stay Updated

Get the latest AI engineering insights delivered to your inbox.

No spam. Unsubscribe at any time.