Files

boris 789bb38e69 feat: initialize OmniClaw skills registry

2026-04-24 01:12:20 -07:00

10 KiB

Raw Permalink Blame History

GPT Image 2 API 调用文档

本文档面向 sub2api/OpenAI-compatible 网关调用 gpt-image-2。示例默认使用：

BASE_URL=https://claude.omniclaw.store/v1
API_KEY=<从 /keys 页面生成的 sub2api key>

不要把 .codex/auth.json 里的 ChatGPT OAuth token 当 API key 使用。

快速结论

直接生成图片：使用 POST /v1/images/generations，model 传 gpt-image-2。
编辑图片：使用 POST /v1/images/edits，multipart 上传 image[]、可选 mask。
Agent/Codex 场景：主模型仍用 gpt-5.5，通过 Responses API 的 image_generation tool 调图像能力；不要把 Codex 主模型设成 gpt-image-2。
gpt-image-2 返回 base64 图片数据，通常是 data[0].b64_json。
3840x2160 4K 可用，但属于高像素、长耗时场景；生产调用应设置 180-300 秒超时。

官方能力摘要

gpt-image-2 是图片生成和编辑模型，支持文本输入、图片输入、图片输出。模型别名和快照：

gpt-image-2
gpt-image-2-2026-04-21

支持端点：

/v1/images/generations
/v1/images/edits
/v1/responses   # 通过 image_generation tool

官方参考：

认证

export BASE_URL="https://claude.omniclaw.store/v1"
export API_KEY="sk-..."

所有 JSON 请求带：

Authorization: Bearer $API_KEY
Content-Type: application/json

multipart 编辑接口由 curl -F 或 SDK 自动设置 Content-Type。

生成图片

最小请求

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A compact Apple-style dashboard UI, clean white background",
    "size": "1024x1024",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image.json

解码：

jq -r '.data[0].b64_json' image.json | base64 --decode > image.png

4K 请求

curl -sS "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 300 \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A modern product poster, cinematic lighting, premium realistic photography",
    "size": "3840x2160",
    "quality": "medium",
    "output_format": "png",
    "n": 1
  }' > image-4k.json

生产建议：4K + high 很慢且成本高。先用 1024x1024 或 1536x1024 验证提示词，再升到 3840x2160。

生成参数

参数	类型	建议值	说明
`model`	string	`gpt-image-2`	必填。也可用快照 `gpt-image-2-2026-04-21`。
`prompt`	string	详细自然语言	必填。写清主体、环境、镜头、风格、限制。
`n`	number	`1`	生成数量。生产建议单张并发调度，便于重试和计费。
`size`	string	`1024x1024`、`1536x1024`、`3840x2160`	`gpt-image-2` 支持灵活尺寸，见下方尺寸约束。
`quality`	string	`low`、`medium`、`high`、`auto`	草稿用 `low`，常规用 `medium`，最终图用 `high`。
`output_format`	string	`png`、`jpeg`、`webp`	默认 `png`。延迟敏感优先 `jpeg`。
`output_compression`	number	`0-100`	仅 `jpeg`/`webp` 有意义。
`background`	string	`auto`、`opaque`	`gpt-image-2` 当前不支持 `transparent`。
`moderation`	string	`auto`、`low`	控制图像生成过滤强度；仍需遵守内容政策。
`stream`	boolean	`false`	开启 SSE 流式图片事件。
`partial_images`	number	`0-3`	流式时返回部分图片；会增加输出 token 成本。
`user`	string	用户 ID	终端用户标识，便于审计和滥用监控。

尺寸约束

gpt-image-2 的 size 可以是 auto，也可以是满足约束的 宽x高：

最大边不超过 3840px
宽和高都必须是 16px 的倍数
长边/短边比例不超过 3:1
总像素在 655,360 到 8,294,400 之间

常用尺寸：

1024x1024    # 方图，通常最快
1536x1024    # 横图
1024x1536    # 竖图
2048x2048    # 2K 方图
2048x1152    # 2K 横图
3840x2160    # 4K 横图
2160x3840    # 4K 竖图
auto

超过 2560x1440 的输出通常应按实验性高像素场景处理：高延迟、高成本、失败概率更高。

返回结构

典型响应：

{
  "created": 1770000000,
  "background": "opaque",
  "data": [
    {
      "b64_json": "...",
      "revised_prompt": "..."
    }
  ],
  "model": "gpt-image-2",
  "output_format": "png",
  "quality": "medium",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 43,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 43
    },
    "output_tokens": 196,
    "output_tokens_details": {
      "image_tokens": 196,
      "text_tokens": 0
    },
    "total_tokens": 239
  }
}

业务侧应持久化：

model
size
quality
output_format
usage.total_tokens
usage.input_tokens
usage.output_tokens
请求耗时
上游账号/分组/用户/key

编辑图片

单图编辑

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "prompt=Replace the sofa with a minimalist white lounge chair" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  -F "output_format=png" \
  > edit.json

局部遮罩编辑

curl -sS "$BASE_URL/images/edits" \
  -H "Authorization: Bearer $API_KEY" \
  -F "model=gpt-image-2" \
  -F "image[]=@input.png" \
  -F "mask=@mask.png" \
  -F "prompt=Change only the transparent masked region into a glass button" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  > edit-mask.json

遮罩要求：

image 和 mask 必须同格式、同尺寸
文件小于 50MB
mask 必须包含 alpha 通道
gpt-image-2 不要传 input_fidelity；它自动按高保真处理输入图

Responses API 调用 image_generation tool

用于多轮 Agent、让模型先理解需求再调用图片工具。主模型使用文本/Agent 模型，例如 gpt-5.5。

curl -sS "$BASE_URL/responses" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Generate a clean product poster for an AI proxy service.",
    "tools": [
      {
        "type": "image_generation",
        "quality": "medium",
        "size": "1536x1024",
        "output_format": "png"
      }
    ]
  }' > response-image.json

注意：

model 是主推理模型，不是 gpt-image-2
image_generation 工具负责图片生成
sub2api 对 Codex 官方客户端请求会注入 image_generation 工具提示，但业务调用仍建议显式传 tool

流式图片

Image API 支持流式生成：

curl -N "$BASE_URL/images/generations" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A futuristic city skyline at sunrise",
    "stream": true,
    "partial_images": 2,
    "size": "1536x1024",
    "quality": "medium"
  }'

事件类型：

image_generation.partial_image
image_generation.completed

partial_images 可设 0-3。每张 partial image 会额外产生输出 token 成本。

SDK 示例

Node.js

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.API_KEY,
  baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
});

const result = await client.images.generate({
  model: "gpt-image-2",
  prompt: "A premium product poster for an AI service",
  size: "1536x1024",
  quality: "medium",
  output_format: "png",
  n: 1,
});

const b64 = result.data?.[0]?.b64_json;
if (!b64) throw new Error("No image returned");
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));

Python

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
)

result = client.images.generate(
    model="gpt-image-2",
    prompt="A premium product poster for an AI service",
    size="1536x1024",
    quality="medium",
    output_format="png",
    n=1,
)

b64 = result.data[0].b64_json
with open("image.png", "wb") as f:
    f.write(base64.b64decode(b64))

生产调度建议

路由：图片生成优先使用 plus/team/pro OpenAI OAuth 账号，避免 free 账号能力不足或限流。
超时：普通图设置 120 秒，4K 设置 300 秒。
重试：只对网络错误、502/503/504 做有限重试；不要对内容政策拒绝无限重试。
并发：4K 请求输出 token 高，建议单账号小并发；普通 1024 图可更高并发。
成本：记录 usage 并按 input_tokens + output_tokens 计费；4K 输出 token 可能远高于 1024。
延迟：延迟敏感优先 jpeg，草稿用 quality: low。
失败降级：4K/high 失败时降为 4K/medium；仍失败则 1536x1024/medium 先出图，再走放大流程。

常见错误

现象	可能原因	处理
`401 INVALID_API_KEY`	key 不是 sub2api key，或已删除/停用	从 `/keys` 重新生成 key
`400 invalid_request_error`	参数不兼容，例如透明背景、尺寸不合法	检查 `size`、`background`、`quality`
`429 usage_limit_reached`	命中 OpenAI 账号用量窗口	切换 plus/team/pro 账号或等待恢复
`502 Upstream request failed`	上游没返回图片、网络断开、内容被拒绝文本化	看服务端日志；必要时改提示词/降质量/改尺寸
超过 2 分钟	高像素或复杂提示词	设置更长超时，使用流式或先低分辨率验证
`/v1/models` 不显示 `gpt-image-2`	Codex 主模型列表不等于图片接口能力列表	直接调用 `/v1/images/generations`

安全边界

业务侧应在请求前过滤明显违规内容，尤其是：

未成年人或年轻化人物的性化内容
非自愿、胁迫、性暴力场景
明确裸露或露骨性行为
违法、仇恨、极端暴力内容

建议提示词显式写清“成年人、非露骨、无裸露、完全穿着”等约束，降低被上游拒绝或返回非图片文本的概率。

10 KiB Raw Permalink Blame History Unescape Escape