Skip to main content

Chat Completions

The chat completions endpoint is the core of HPP Router. It is OpenAI-compatible, so request and response shapes match what you already know.

POST https://router.hpp.io/llm/v1/chat/completions

Basic request

curl -X POST https://router.hpp.io/llm/v1/chat/completions \
-H "apikey: $HPPROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"max_completion_tokens": 100
}'

Request fields

FieldTypeNotes
modelstringRequired. provider/model (e.g. openai/gpt-5) or hpprouter/auto.
messagesarrayRequired. Chat messages with role and content.
streambooleanStream tokens as Server-Sent Events. See Streaming.
max_tokensintegerMaximum tokens to generate.
max_completion_tokensintegerMaximum completion tokens (newer OpenAI-style field).
temperaturenumberSampling temperature.
stream_optionsobjectStreaming options passed through to the provider.

Additional provider-specific fields are passed through to the upstream model.

Message roles

role is one of system, user, assistant, or tool. The content is either a string or an array of content parts (used for vision/multimodal).

Response

{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1732700000,
"model": "openai/gpt-5",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hi there!" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 12,
"total_tokens": 20
}
}

The usage block drives billing. When you use hpprouter/auto, check the X-HPP-Router-Resolved-Model header to see which model was billed — see Smart Routing.

Choosing a model

  • Pass an explicit provider/model for deterministic routing.
  • Pass hpprouter/auto to let the gateway pick a cost-appropriate model per request.

Errors

Requests are checked against your quota before being forwarded. Insufficient quota returns 429; auth failures return 401. See Errors.