Models & Pricing
HPP Router exposes models from multiple providers behind one OpenAI-compatible API. Billing is token-based and deducted from your prepaid quota.
Listing models
Use the OpenAI-compatible models endpoint to discover what's available, including the virtual hpprouter/auto model:
- cURL
curl https://router.hpp.io/llm/v1/models \
-H "apikey: $HPPROUTER_API_KEY"
The response is an OpenAI-style list. Each model may include a pricing object:
{
"object": "list",
"data": [
{
"id": "openai/gpt-5",
"object": "model",
"owned_by": "openai",
"pricing": {
"input": 1.25,
"output": 10,
"cache_write": null,
"cache_read": null
}
}
]
}
| Field | Meaning |
|---|---|
id | Model identifier — use this as the request model. |
owned_by | Provider that owns the model. |
pricing.input | Price per 1M input tokens (USD). |
pricing.output | Price per 1M output tokens (USD). |
pricing.cache_write / cache_read | Cache pricing per 1M tokens, when applicable (may be null). |
The models list is the source of truth for what is currently enabled. The examples below are illustrative pricing taken from the gateway documentation and may change.
Model identifiers
Specify a model as provider/model, or use the virtual smart-routing model:
Example model | Description |
|---|---|
hpprouter/auto | Smart routing — the gateway resolves an actual model per request. |
openai/gpt-5 | OpenAI GPT-5. |
openai/gpt-4o | OpenAI GPT-4o (vision-capable). |
anthropic/claude-sonnet-4 | Anthropic Claude Sonnet 4. |
moonshotai/kimi-k2.6 | Moonshot Kimi. |
ollama/gpt-oss:120b | Local Ollama model (billed at $0). |
ollama/solidity-master:2 | Solidity finetune model on a dedicated Ollama backend (billed at $0). |
How billing works
- Pricing is per million tokens, split into input (prompt) and output (completion) rates.
- Cost for a request ≈
(prompt_tokens × input_rate + completion_tokens × output_rate) / 1,000,000. - The
usageblock in each response reports the token counts used for billing. - Local models (e.g.
ollama/*) are tracked at $0 cost, but token usage is still logged.
Example rates
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
openai/gpt-5 | $1.25 | $10 |
gpt-image-1 (image generation) | $10 | $40 |
See Image Generation for how image quality affects output-token usage.
Billing with smart routing
When you request hpprouter/auto, billing uses the resolved model's pricing — not a price for auto itself. The resolved model is returned in the X-HPP-Router-Resolved-Model response header and recorded in your usage logs. See Smart Routing for details.
Checking your quota and usage
GET /api/quota-check— remaining quota for the authenticated consumer.GET /api/usage— usage summary (requests, total tokens, total cost, remaining quota).
See Quota & Usage.