zsb/omniclaw-skills

Fork 0

Files

boris 789bb38e69 feat: initialize OmniClaw skills registry

2026-04-24 01:12:20 -07:00

11 KiB

Raw Permalink Blame History

GPT Image 2 API Guide

This guide describes how to call gpt-image-2 through sub2api or any OpenAI-compatible gateway.

Default examples use:

BASE_URL=https://claude.omniclaw.store/v1
API_KEY=<sub2api API key generated from the /keys page>

Do not use ChatGPT OAuth tokens from .codex/auth.json as API keys.

Quick Summary

Direct image generation: call POST /v1/images/generations with model: "gpt-image-2".
Image editing: call POST /v1/images/edits with multipart image[] files and an optional mask.
Agent/Codex workflows: keep the main model as a text/agent model such as gpt-5.5, then call image generation through the Responses API image_generation tool.
Do not use gpt-image-2 as the Codex main model.
gpt-image-2 normally returns base64 image data at data[0].b64_json.
3840x2160 4K output works but is high-latency and high-cost; use 180-300 second timeouts for production.

Official Capability Summary

gpt-image-2 is an image generation and editing model with text input, image input, and image output support.

Model aliases:

gpt-image-2
gpt-image-2-2026-04-21

Supported API surfaces:

/v1/images/generations
/v1/images/edits
/v1/responses   # via image_generation tool

Official references:

Authentication

export BASE_URL="https://claude.omniclaw.store/v1"
export API_KEY="sk-..."

JSON requests require:

Authorization: Bearer $API_KEY
Content-Type: application/json

For multipart image edits, let curl -F or the SDK set Content-Type.

Image Generation

Minimal Request

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A compact Apple-style dashboard UI, clean white background",
    "size": "1024x1024",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image.json

Decode the response:

jq -r '.data[0].b64_json' image.json | base64 --decode > image.png

4K Request

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 300 \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A modern product poster, cinematic lighting, premium realistic photography",
    "size": "3840x2160",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image-4k.json

Production recommendation: first validate prompts with 1024x1024 or 1536x1024, then upscale the request to 3840x2160. 4K + high can be slow and expensive.

Generation Parameters

Parameter	Type	Recommended value	Notes
`model`	string	`gpt-image-2`	Required. The snapshot `gpt-image-2-2026-04-21` is also valid.
`prompt`	string	detailed natural language	Required. Include subject, environment, camera, style, lighting, and constraints.
`n`	number	`1`	Number of images. Prefer single-image requests for retry and billing attribution.
`size`	string	`1024x1024`, `1536x1024`, `3840x2160`	Flexible sizes are supported when they satisfy the model constraints.
`quality`	string	`low`, `medium`, `high`, `auto`	Use `low` for drafts, `medium` for normal output, `high` for final assets.
`output_format`	string	`png`, `jpeg`, `webp`	Default is usually `png`; use `jpeg` for latency-sensitive outputs.
`output_compression`	number	`0-100`	Only applies to `jpeg` and `webp`.
`background`	string	`auto`, `opaque`	`gpt-image-2` currently does not support `transparent`.
`moderation`	string	`auto`, `low`	Adjusts filtering level but does not bypass safety policy.
`stream`	boolean	`false`	Enables SSE image streaming.
`partial_images`	number	`0-3`	Streaming only; partial images increase output token cost.
`user`	string	end-user ID	Useful for audit and abuse monitoring.

Size Constraints

size can be auto or a valid widthxheight value:

Maximum edge length is 3840px.
Width and height must both be multiples of 16px.
Long edge to short edge ratio must be at most 3:1.
Total pixels must be between 655,360 and 8,294,400.

Common values:

Treat outputs larger than 2560x1440 as experimental high-pixel workloads with higher latency, higher cost, and higher failure probability.

Response Shape

Typical response:

{
  "created": 1770000000,
  "background": "opaque",
  "data": [
    {
      "b64_json": "...",
      "revised_prompt": "..."
    }
  ],
  "model": "gpt-image-2",
  "output_format": "png",
  "quality": "medium",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 43,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 43
    },
    "output_tokens": 196,
    "output_tokens_details": {
      "image_tokens": 196,
      "text_tokens": 0
    },
    "total_tokens": 239
  }
}

Production systems should store:

model
size
quality
output_format
usage.total_tokens
usage.input_tokens
usage.output_tokens
latency
upstream account, group, user, and key identifiers

Image Editing

Single-image Edit

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "prompt=Replace the sofa with a minimalist white lounge chair" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  -F "output_format=png" \
  > edit.json

Masked Local Edit

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "mask=@mask.png" \
  -F "prompt=Change only the transparent masked region into a glass button" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  > edit-mask.json

Mask requirements:

image and mask must have the same format and dimensions.
Files must be under 50MB.
mask must include an alpha channel.
Do not pass input_fidelity for gpt-image-2; the model processes image inputs at high fidelity by default.

Responses API With `image_generation`

Use this when an agent should reason about the task before generating an image. The main model should be a text/agent model, such as gpt-5.5.

curl -sS "$BASE_URL/responses" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Generate a clean product poster for an AI proxy service.",
    "tools": [
      {
        "type": "image_generation",
        "quality": "medium",
        "size": "1536x1024",
        "output_format": "png"
      }
    ]
  }' > response-image.json

Important:

model is the main reasoning model, not gpt-image-2.
The image_generation tool performs the image work.
sub2api may inject the image tool for official Codex clients, but application calls should pass it explicitly.

Streaming Images

The Images API supports SSE streaming:

curl -N "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A futuristic city skyline at sunrise",
    "stream": true,
    "partial_images": 2,
    "size": "1536x1024",
    "quality": "medium"
  }'

Events:

image_generation.partial_image
image_generation.completed

partial_images can be 0-3. Each partial image adds output token cost.

SDK Examples

Node.js

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.API_KEY,
  baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
});

const result = await client.images.generate({
  model: "gpt-image-2",
  prompt: "A premium product poster for an AI service",
  size: "1536x1024",
  quality: "medium",
  output_format: "png",
  n: 1,
});

const b64 = result.data?.[0]?.b64_json;
if (!b64) throw new Error("No image returned");
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));

Python

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
)

result = client.images.generate(
    model="gpt-image-2",
    prompt="A premium product poster for an AI service",
    size="1536x1024",
    quality="medium",
    output_format="png",
    n=1,
)

b64 = result.data[0].b64_json
with open("image.png", "wb") as f:
    f.write(base64.b64decode(b64))

Production Dispatch

Routing: prefer plus/team/pro OpenAI OAuth accounts for image workloads.
Timeout: use 120 seconds for normal images and 300 seconds for 4K.
Retry: only retry transient network failures and 502/503/504 with low retry counts.
Concurrency: 4K output produces many image tokens; use low per-account concurrency. Standard 1024 images can use higher concurrency.
Billing: record usage and charge based on input and output tokens. 4K can produce far more output tokens than 1024 images.
Latency: use jpeg and quality: low for drafts or latency-sensitive previews.
Fallback: if 4K/high fails, retry 4K/medium; if that still fails, generate 1536x1024/medium and upscale separately.

Common Errors

Symptom	Likely cause	Action
`401 INVALID_API_KEY`	Key is not a sub2api key or is disabled/deleted	Generate a new key from `/keys`
`400 invalid_request_error`	Incompatible params such as transparent background or invalid size	Check `size`, `background`, and `quality`
`429 usage_limit_reached`	Upstream account usage window hit	Switch plus/team/pro account or wait for reset
`502 Upstream request failed`	Upstream did not return image data, network failed, or content was refused	Inspect server logs, simplify prompt, lower quality or size
Request takes over 2 minutes	High pixels or complex prompt	Increase timeout, use streaming, or test lower resolution first
`/v1/models` does not show `gpt-image-2`	Codex/text model list is not the Images API capability list	Call `/v1/images/generations` directly

Safety Boundary

Filter clearly disallowed content before sending requests, especially:

Sexualized minors or young-looking subjects
Non-consensual sexual content, coercion, or sexual violence
Explicit nudity or graphic sexual activity
Illegal, hateful, or extreme violent content

For safe romantic scenes, explicitly constrain prompts with terms such as adult, non-explicit, no nudity, and fully clothed.

11 KiB Raw Permalink Blame History