Skip to main content
POST https://api.runinfra.ai/v1/responses
RunInfra /v1/responses is a chat-completions compatibility adapter for LLM and vision-language deployments. The gateway converts supported input and instructions fields into chat messages, forwards the request through the same serving path as /v1/chat/completions, then wraps the result in a Responses-shaped envelope.
This endpoint does not implement full OpenAI Responses state, include, reasoning, tool, conversation-item, or background-job semantics. Use /v1/chat/completions when your app needs tool calls today.

Minimal request

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ.get("RUNINFRA_BASE_URL", "https://api.runinfra.ai/v1"),
    api_key=os.environ["RUNINFRA_API_KEY"],
)

response = client.responses.create(
    model=os.environ["RUNINFRA_MODEL"],
    input="Write a one-sentence deployment health check.",
)

print(response.output_text)

Streaming

Set stream: true to receive server-sent events. The stream uses OpenAI-shaped Responses event names for text deltas and terminal completion events when the selected deployment supports streaming.
import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ.get("RUNINFRA_BASE_URL", "https://api.runinfra.ai/v1"),
    api_key=os.environ["RUNINFRA_API_KEY"],
)

stream = client.responses.create(
    model=os.environ["RUNINFRA_MODEL"],
    input="Count to five.",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Supported request fields

model
string
required
Model id returned by GET /v1/models, or default for a pipeline-scoped snippet.
input
string | object[]
required
Prompt text, or an array of supported Responses input message objects.
instructions
string
System-level instruction text. The adapter maps this into a system message.
stream
boolean
default:"false"
Return a server-sent event stream instead of one JSON response.
max_output_tokens
integer
Maximum generated output tokens. The adapter maps this to the serving backend’s chat completion token limit.
temperature
number
Sampling temperature for compatible LLM deployments.
top_p
number
Nucleus sampling cutoff for compatible LLM deployments.
response_format
object
Structured output format passed through to the chat-completions serving path. Support depends on the selected deployment.
tools
object[]
OpenAI chat-completions tool definitions accepted as pass-through input for compatible deployments. This adapter does not execute hosted tools or manage a stateful tool loop.
tool_choice
string | object
Tool-selection preference passed through to compatible chat-completions deployments.

Not shipped on this adapter

  • Stateful response retrieval or deletion.
  • include, reasoning, hosted tools, conversation items, file search, web search, computer use, and background jobs.
  • Stateful tool execution or hosted tool orchestration. Use Chat completions for production tool loops.

Retry semantics

The native RunInfra SDK treats non-streaming responses.create() as replay-safe only when you provide an idempotency key. Streaming Responses requests are sent once because a partial stream may already have reached your app.

Next steps

Chat completions

The canonical endpoint for tools, structured output, and streaming chat.

RunInfra SDK

Native request IDs, typed errors, idempotency helpers, and streaming wrappers.