Vision / Multimodal

Vision-capable models accept images alongside text in the same chat completion request. HPP Router uses the OpenAI image_url content-part format. For Anthropic routes, the gateway converts image_url parts into native image blocks before calling the provider.

Request

Use an array of content parts in a message, mixing text and image_url:

cURL
curl -X POST https://router.hpp.io/llm/v1/chat/completions \
  -H "apikey: $HPPROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,BASE64_PNG",
            "detail": "auto"
          }
        }
      ]
    }],
    "max_tokens": 256
  }'

Content parts

Part type	Fields
`text`	`text` — the text segment.
`image_url`	`image_url.url` (a data URL or HTTP URL) and optional `image_url.detail`.

You can include multiple image_url parts in a single message.

Using a data URL

When embedding image bytes directly, use a data URL of the form data:image/png;base64,<BASE64>. Replace BASE64_PNG with the base64-encoded image bytes.

With smart routing

You can use hpprouter/auto with multimodal content. Routing follows the same basket/tier rules, and the token estimate includes the image parts. See Smart Routing.

Important distinctions

Vision is not image generation. This endpoint is for understanding images in a chat. To generate images, use Image Generation (gpt-image-1 at /v1/images/generations).
Use a vision-capable model. For example openai/gpt-4o.

Provider limits

Provider attachment limits apply (for example, OpenAI ~20 MB per image, Anthropic ~5 MB per image). The gateway does not resize attachments — send images within the provider's limits.

Request​

Content parts​

Using a data URL​

With smart routing​

Important distinctions​

Provider limits​