Files
omniclaw-skills/apis/sub2api/gpt-image-2.en.md
2026-04-24 01:12:20 -07:00

11 KiB

GPT Image 2 API Guide

This guide describes how to call gpt-image-2 through sub2api or any OpenAI-compatible gateway.

Default examples use:

BASE_URL=https://claude.omniclaw.store/v1
API_KEY=<sub2api API key generated from the /keys page>

Do not use ChatGPT OAuth tokens from .codex/auth.json as API keys.

Quick Summary

  • Direct image generation: call POST /v1/images/generations with model: "gpt-image-2".
  • Image editing: call POST /v1/images/edits with multipart image[] files and an optional mask.
  • Agent/Codex workflows: keep the main model as a text/agent model such as gpt-5.5, then call image generation through the Responses API image_generation tool.
  • Do not use gpt-image-2 as the Codex main model.
  • gpt-image-2 normally returns base64 image data at data[0].b64_json.
  • 3840x2160 4K output works but is high-latency and high-cost; use 180-300 second timeouts for production.

Official Capability Summary

gpt-image-2 is an image generation and editing model with text input, image input, and image output support.

Model aliases:

gpt-image-2
gpt-image-2-2026-04-21

Supported API surfaces:

/v1/images/generations
/v1/images/edits
/v1/responses   # via image_generation tool

Official references:

Authentication

export BASE_URL="https://claude.omniclaw.store/v1"
export API_KEY="sk-..."

JSON requests require:

Authorization: Bearer $API_KEY
Content-Type: application/json

For multipart image edits, let curl -F or the SDK set Content-Type.

Image Generation

Minimal Request

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A compact Apple-style dashboard UI, clean white background",
    "size": "1024x1024",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image.json

Decode the response:

jq -r '.data[0].b64_json' image.json | base64 --decode > image.png

4K Request

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 300 \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A modern product poster, cinematic lighting, premium realistic photography",
    "size": "3840x2160",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image-4k.json

Production recommendation: first validate prompts with 1024x1024 or 1536x1024, then upscale the request to 3840x2160. 4K + high can be slow and expensive.

Generation Parameters

Parameter Type Recommended value Notes
model string gpt-image-2 Required. The snapshot gpt-image-2-2026-04-21 is also valid.
prompt string detailed natural language Required. Include subject, environment, camera, style, lighting, and constraints.
n number 1 Number of images. Prefer single-image requests for retry and billing attribution.
size string 1024x1024, 1536x1024, 3840x2160 Flexible sizes are supported when they satisfy the model constraints.
quality string low, medium, high, auto Use low for drafts, medium for normal output, high for final assets.
output_format string png, jpeg, webp Default is usually png; use jpeg for latency-sensitive outputs.
output_compression number 0-100 Only applies to jpeg and webp.
background string auto, opaque gpt-image-2 currently does not support transparent.
moderation string auto, low Adjusts filtering level but does not bypass safety policy.
stream boolean false Enables SSE image streaming.
partial_images number 0-3 Streaming only; partial images increase output token cost.
user string end-user ID Useful for audit and abuse monitoring.

Size Constraints

size can be auto or a valid widthxheight value:

  • Maximum edge length is 3840px.
  • Width and height must both be multiples of 16px.
  • Long edge to short edge ratio must be at most 3:1.
  • Total pixels must be between 655,360 and 8,294,400.

Common values:

1024x1024
1536x1024
1024x1536
2048x2048
2048x1152
3840x2160
2160x3840
auto

Treat outputs larger than 2560x1440 as experimental high-pixel workloads with higher latency, higher cost, and higher failure probability.

Response Shape

Typical response:

{
  "created": 1770000000,
  "background": "opaque",
  "data": [
    {
      "b64_json": "...",
      "revised_prompt": "..."
    }
  ],
  "model": "gpt-image-2",
  "output_format": "png",
  "quality": "medium",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 43,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 43
    },
    "output_tokens": 196,
    "output_tokens_details": {
      "image_tokens": 196,
      "text_tokens": 0
    },
    "total_tokens": 239
  }
}

Production systems should store:

  • model
  • size
  • quality
  • output_format
  • usage.total_tokens
  • usage.input_tokens
  • usage.output_tokens
  • latency
  • upstream account, group, user, and key identifiers

Image Editing

Single-image Edit

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "prompt=Replace the sofa with a minimalist white lounge chair" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  -F "output_format=png" \
  > edit.json

Masked Local Edit

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "mask=@mask.png" \
  -F "prompt=Change only the transparent masked region into a glass button" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  > edit-mask.json

Mask requirements:

  • image and mask must have the same format and dimensions.
  • Files must be under 50MB.
  • mask must include an alpha channel.
  • Do not pass input_fidelity for gpt-image-2; the model processes image inputs at high fidelity by default.

Responses API With image_generation

Use this when an agent should reason about the task before generating an image. The main model should be a text/agent model, such as gpt-5.5.

curl -sS "$BASE_URL/responses" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Generate a clean product poster for an AI proxy service.",
    "tools": [
      {
        "type": "image_generation",
        "quality": "medium",
        "size": "1536x1024",
        "output_format": "png"
      }
    ]
  }' > response-image.json

Important:

  • model is the main reasoning model, not gpt-image-2.
  • The image_generation tool performs the image work.
  • sub2api may inject the image tool for official Codex clients, but application calls should pass it explicitly.

Streaming Images

The Images API supports SSE streaming:

curl -N "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A futuristic city skyline at sunrise",
    "stream": true,
    "partial_images": 2,
    "size": "1536x1024",
    "quality": "medium"
  }'

Events:

image_generation.partial_image
image_generation.completed

partial_images can be 0-3. Each partial image adds output token cost.

SDK Examples

Node.js

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.API_KEY,
  baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
});

const result = await client.images.generate({
  model: "gpt-image-2",
  prompt: "A premium product poster for an AI service",
  size: "1536x1024",
  quality: "medium",
  output_format: "png",
  n: 1,
});

const b64 = result.data?.[0]?.b64_json;
if (!b64) throw new Error("No image returned");
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));

Python

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
)

result = client.images.generate(
    model="gpt-image-2",
    prompt="A premium product poster for an AI service",
    size="1536x1024",
    quality="medium",
    output_format="png",
    n=1,
)

b64 = result.data[0].b64_json
with open("image.png", "wb") as f:
    f.write(base64.b64decode(b64))

Production Dispatch

  • Routing: prefer plus/team/pro OpenAI OAuth accounts for image workloads.
  • Timeout: use 120 seconds for normal images and 300 seconds for 4K.
  • Retry: only retry transient network failures and 502/503/504 with low retry counts.
  • Concurrency: 4K output produces many image tokens; use low per-account concurrency. Standard 1024 images can use higher concurrency.
  • Billing: record usage and charge based on input and output tokens. 4K can produce far more output tokens than 1024 images.
  • Latency: use jpeg and quality: low for drafts or latency-sensitive previews.
  • Fallback: if 4K/high fails, retry 4K/medium; if that still fails, generate 1536x1024/medium and upscale separately.

Common Errors

Symptom Likely cause Action
401 INVALID_API_KEY Key is not a sub2api key or is disabled/deleted Generate a new key from /keys
400 invalid_request_error Incompatible params such as transparent background or invalid size Check size, background, and quality
429 usage_limit_reached Upstream account usage window hit Switch plus/team/pro account or wait for reset
502 Upstream request failed Upstream did not return image data, network failed, or content was refused Inspect server logs, simplify prompt, lower quality or size
Request takes over 2 minutes High pixels or complex prompt Increase timeout, use streaming, or test lower resolution first
/v1/models does not show gpt-image-2 Codex/text model list is not the Images API capability list Call /v1/images/generations directly

Safety Boundary

Filter clearly disallowed content before sending requests, especially:

  • Sexualized minors or young-looking subjects
  • Non-consensual sexual content, coercion, or sexual violence
  • Explicit nudity or graphic sexual activity
  • Illegal, hateful, or extreme violent content

For safe romantic scenes, explicitly constrain prompts with terms such as adult, non-explicit, no nudity, and fully clothed.