Skip to main content

Vision / Multimodal

Vision-capable models accept images alongside text in the same chat completion request. HPP Router uses the OpenAI image_url content-part format. For Anthropic routes, the gateway converts image_url parts into native image blocks before calling the provider.

Request

Use an array of content parts in a message, mixing text and image_url:

curl -X POST https://router.hpp.io/llm/v1/chat/completions \
-H "apikey: $HPPROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,BASE64_PNG",
"detail": "auto"
}
}
]
}],
"max_tokens": 256
}'

Content parts

Part typeFields
texttext — the text segment.
image_urlimage_url.url (a data URL or HTTP URL) and optional image_url.detail.

You can include multiple image_url parts in a single message.

Using a data URL

When embedding image bytes directly, use a data URL of the form data:image/png;base64,<BASE64>. Replace BASE64_PNG with the base64-encoded image bytes.

With smart routing

You can use hpprouter/auto with multimodal content. Routing follows the same basket/tier rules, and the token estimate includes the image parts. See Smart Routing.

Important distinctions

  • Vision is not image generation. This endpoint is for understanding images in a chat. To generate images, use Image Generation (gpt-image-1 at /v1/images/generations).
  • Use a vision-capable model. For example openai/gpt-4o.

Provider limits

Provider attachment limits apply (for example, OpenAI ~20 MB per image, Anthropic ~5 MB per image). The gateway does not resize attachments — send images within the provider's limits.