1. 28 Apr, 2026 4 commits
    • alfadb's avatar
      fix(gateway): emit Anthropic-standard SSE error events and failover body · 4c474616
      alfadb authored
      
      
      Two follow-ups to PR #2066's failover-wrap fix:
      
      1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded
         as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage`
         probes for `error.message`, `detail`, or top-level `message` only — so
         `handleFailoverExhausted` and downstream passthrough rules saw an empty
         message, losing the EOF root cause in ops logs. Re-encode as the
         Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`.
         (Addresses the inline review comment from copilot-pull-request-reviewer
         on Wei-Shaw/sub2api#2066.)
      
      2. The streaming `event: error` SSE frame for `response_too_large`,
         `stream_read_error`, and `stream_timeout` was non-standard
         (`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect
         `{"type":"error","error":{"type":"...","message":"..."}}` and parse
         `error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to
         take both reason and message, and emit the standard frame so client
         SDKs surface a real diagnostic message instead of a generic stream error.
      
      This does not by itself prevent task interruption on long-stream EOF
      (SSE has no resume; client-side retry remains the only complete fix), but
      it gives both server-side ops logs and client-side error UIs a meaningful
      upstream message so users know the next step is to retry.
      
      Tests updated to assert the new body shape on both branches plus a new
      assertion that `ExtractUpstreamErrorMessage` returns a non-empty string.
      Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
      4c474616
    • alfadb's avatar
      fix(gateway): wrap Anthropic stream EOF as failover error before client output · 63275735
      alfadb authored
      
      
      Anthropic streaming path (gateway_service.go) returned a plain error on
      upstream SSE read failure, so the handler-level UpstreamFailoverError check
      never fired and the client received a bare `stream_read_error` event,
      breaking long-running tasks even when no bytes had been written yet.
      
      The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge
      backends doing graceful rotation: Go's http.Transport surfaces this as
      `unexpected EOF` and never auto-retries.
      
      Mirror what the OpenAI and antigravity gateways already do: when the read
      error happens before any byte has reached the client (`!c.Writer.Written()`),
      return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}`
      so the handler can retry on the same or another account. After client
      output has begun, SSE has no resume protocol — keep the existing passthrough
      behavior.
      
      Tests cover both branches via streamReadCloser-based fixtures.
      Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
      63275735
    • Wesley Liddick's avatar
      Merge pull request #2051 from DaydreamCoding/openai-fast-flex-policy · b0a2252e
      Wesley Liddick authored
      feat(openai): OpenAI Fast/Flex Policy 完整实现(HTTP + WebSocket + Admin)
      b0a2252e
    • DaydreamCoding's avatar
      feat(openai): OpenAI Fast/Flex Policy 完整实现(HTTP + WebSocket + Admin) · 30f55a1f
      DaydreamCoding authored
      
      
      对称参照 Claude BetaPolicy 的 fast-mode 过滤实现,新增针对 OpenAI 上游
      service_tier 字段(priority / flex,含客户端 "fast" → "priority" 归一化)的
      pass / filter / block 三态策略,覆盖全部 OpenAI 入口 + admin 配置入口。
      
      后端核心
      - 新增 SettingKeyOpenAIFastPolicySettings、OpenAIFastPolicyRule、
        OpenAIFastPolicySettings 配置模型,含规则的 service_tier × action × scope
        × 模型白名单 × fallback action 维度。
      - SettingService.Get/SetOpenAIFastPolicySettings;缺失时返回内置默认策略
        (所有模型的 priority 走 filter,whitelist 为空,fallback=pass)。设计
        依据:service_tier=fast 是用户级开关,与 model 字段正交,默认锁定特定
        model slug 会留下"用 gpt-4 + fast 透传 priority 上游"的绕过路径。JSON
        解析失败不再静默 fallback,slog.Warn 记录脏数据,便于运维定位。
      - service_tier 归一化(trim + ToLower + fast→priority + 白名单 priority/flex)
        与策略评估(evaluateOpenAIFastPolicy)作为唯一真实来源,HTTP / WS 共用。
        抽出纯函数 evaluateOpenAIFastPolicyWithSettings,配合 ctx-bound settings
        快照(withOpenAIFastPolicyContext / openAIFastPolicySettingsFromContext),
        WS 长会话入口预取一次后所有帧复用,避免每帧打到 settingService。
      
      HTTP 入口(4 个)
      - Chat Completions、Anthropic 兼容(Messages,含 BetaFastMode→priority 二次
        命中)、原生 Responses、Passthrough Responses 全部接入
        applyOpenAIFastPolicyToBody,filter 走 sjson 顶层删除 service_tier,block
        返回 403 forbidden_error JSON。
      - 4 入口统一使用 upstream 视角的 model(GetMappedModel +
        normalizeOpenAIModelForUpstream + Codex OAuth normalize 后的 slug),
        避免 chat/messages/native /responses/passthrough 因为 model 维度不同
        造成 whitelist 命中差异。
      - 在 pass 路径也把客户端 "fast" 别名归一化为 "priority" 写回 body,
        否则 native /responses 与 passthrough 入口会把 "fast" 原样透传给上游
        导致 400/拒绝(chat-completions 入口的 normalizeResponsesBodyServiceTier
        此前已具备同等行为)。
      
      WebSocket 入口
      - 新增 applyOpenAIFastPolicyToWSResponseCreate:严格匹配
        type="response.create",仅处理顶层 service_tier;filter 用 sjson 删字段,
        block 返回 typed *OpenAIFastBlockedError。
      - ingress 路径在 parseClientPayload 内调用,block 命中先 Write Realtime
        风格 error event 再返回 OpenAIWSClientCloseError(StatusPolicyViolation
        =1008),依赖底层 WebSocket Conn.Write 的同步 flush 保证 error 先于
        close。
      - passthrough 路径在 RunEntry 前对 firstClientMessage 应用策略,并通过
        openAIWSPolicyEnforcingFrameConn 包装 ReadFrame 对每个 client→upstream
        帧执行策略;后续帧无 model 字段时回退到 capturedSessionModel。
        filter 闭包内同时侦测 session.update / session.created 帧的 session.model
        字段刷新 capturedSessionModel,封堵"首帧 model=gpt-4o(pass)→
        session.update 改为 gpt-5.5 → 不带 model 的 response.create fallback
        到 gpt-4o"的 mid-session 绕过路径。
      - passthrough billing:requestServiceTier 在策略 filter 之后再从
        firstClientMessage 提取,filter 命中时 OpenAIForwardResult.ServiceTier
        上报 nil(default tier),与 HTTP 入口(reqBody 来自 post-filter map)
        / WS ingress(payload 来自 post-filter bytes)的语义一致。
      - 错误事件 schema:{event_id: "evt_<32hex>", type: "error",
        error: {type: "forbidden_error", code: "policy_violation", message}},
        与 OpenAI codex 客户端 error event 解析兼容。
      
      Admin / Frontend
      - dto.SystemSettings / UpdateSettingsRequest 新增
        openai_fast_policy_settings 字段(omitempty),bulk GET/PUT 接入。
      - Settings 页 Gateway 页签新增 Fast/Flex Policy 表单卡片:
        service_tier × action × scope × 模型白名单 × fallback action 全字段配置。
      - 前端守门:openaiFastPolicyLoaded 标志仅在 GET 真带回字段时才允许回写,
        避免 rollout/错误把默认规则覆盖成空;saveSettings 回写循环 skip 该字段,
        由专用刷新逻辑处理;仅 action=block 时发送 error_message,匹配后端
        omitempty 行为。
      
      测试
      - HTTP 路径:openai_fast_policy_test.go 覆盖默认配置(whitelist=[],所有
        模型 priority filter)/ block 自定义错误 / scope 区分 / filter 删字段 /
        block 不改 body / block 短路上游 / Anthropic BetaFastMode 触发 OpenAI
        fast policy 等场景。
      - WebSocket 路径:openai_fast_policy_ws_test.go 覆盖
          helper 单元(filter / fast→priority 归一化 / flex 透传 / block typed
          error / 无 service_tier 字节不变 / 非 response.create 帧不动 / 空 type
          帧不动 / event_id+code 字段断言 / 非字符串 service_tier 容错)+
          pass 路径 fast 别名归一化回归 +
          ingress 端到端(filter 后上游不含 service_tier / block 后客户端先收
          error event 再收 close 1008 且上游 0 写)+
          passthrough capturedSessionModel fallback 用例(whitelist 策略下首帧
          建立、缺 model 命中 fallback、缺少 fallback 时的 leak 文档化)+
          passthrough session.update / session.created 旋转 capturedSessionModel
          的 mid-session 绕过回归 +
          passthrough billing post-filter ServiceTier 与 idempotent filter 回归。
      Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
      30f55a1f
  2. 27 Apr, 2026 2 commits
  3. 26 Apr, 2026 7 commits
  4. 25 Apr, 2026 22 commits
  5. 24 Apr, 2026 5 commits
    • Wuxie233's avatar
      fix(apicompat): recognize web_search_20250305 / google_search in Responses to... · 5f630fbb
      Wuxie233 authored
      fix(apicompat): recognize web_search_20250305 / google_search in Responses to Anthropic tool conversion
      5f630fbb
    • keh4l's avatar
      fix(gateway): skip client header passthrough on OAuth mimicry path · bdbd2916
      keh4l authored
      Root cause of persistent third-party detection: sub2api's
      buildUpstreamRequest transparently forwards client headers via
      allowedHeaders whitelist (addHeaderRaw) before applying mimicry
      overrides. When third-party clients (opencode, etc.) send their own
      anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
      values, these get appended to the request alongside our injected
      headers, creating an inconsistent header set that Anthropic detects.
      
      Parrot's build_upstream_headers constructs exactly 9 headers from
      scratch and never forwards anything from the client. This is why
      'same opencode version, some users work some don't' — different
      opencode configs/versions send different header combinations.
      
      Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
      client header passthrough loop entirely. The subsequent
      applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
      pipeline constructs all necessary headers from our controlled values.
      
      Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
      now unconditionally rewrite system (even if client already sent a
      Claude Code-style prompt), ensuring billing attribution block is
      always present.
      bdbd2916
    • keh4l's avatar
      fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity · 6dc89765
      keh4l authored
      Before: isClaudeCodeRequest() checked whether the client looks like a
      real Claude Code CLI (UA, system prompt, X-App header, metadata format).
      If it looked like Claude Code, all mimicry was skipped — the assumption
      being that a real CLI needs no help.
      
      Problem: third-party tools like opencode partially impersonate Claude
      Code (sending claude-cli UA + claude-code beta + CC system prompt) but
      miss critical details (billing attribution block, tool-name obfuscation,
      cache breakpoints, full beta set). Some users' opencode instances pass
      the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
      while Anthropic still detects the request as third-party.
      
      This explains why 'same opencode version, some users work, some don't'
      — it depends on which opencode features/config trigger the validator.
      
      Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
      matching Parrot's behavior (Parrot never checks client identity).
      This is safe because our mimicry is strictly more complete than any
      third-party client's partial impersonation.
      
      Changed:
        - /v1/messages path: remove isClaudeCode gate
        - /v1/messages/count_tokens path: same
      6dc89765
    • keh4l's avatar
      fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths · f3233db0
      keh4l authored
      The previous commit only wired stripMessageCacheControl,
      addMessageCacheBreakpoints, and tool-name obfuscation into
      applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
      /responses). The native /v1/messages path and count_tokens path
      have their own independent mimicry code blocks and were missed.
      
      Now all three entry points share the same D/E/F pipeline:
        - /v1/messages (gateway_service.go forwardAnthropic)
        - /v1/messages/count_tokens (gateway_service.go countTokens)
        - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
      f3233db0
    • keh4l's avatar
      feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints · 6e12578b
      keh4l authored
      Implements the remaining three parity items with Parrot cc_mimicry:
      
        D) Tool-name obfuscation
           - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
             Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
             Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
             math/rand; byte-exact reproduction is impossible (Python hash vs
             Go hash), but the two invariants that matter are preserved:
               * same input tool_names yield identical mapping (cache hit)
               * prefix pool is shuffled (names look distributed)
           - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
             applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
           - Server tools (web_search_20250305, computer_*, etc.) are NOT
             renamed; only type=='function' and type=='custom' tools are.
           - tool_choice.name is rewritten in sync (only when type=='tool').
           - Response side: bytes-level replace on every SSE chunk / JSON
             body at 6 injection points (standard stream/non-stream,
             passthrough stream/non-stream, chat_completions stream +
             non-stream, responses stream + non-stream). Reverse mapping
             applied longest-fake-name-first to prevent substring conflicts
             (parity with Parrot _restore_tool_names_in_chunk).
           - tool_choice is no longer unconditionally deleted in
             normalizeClaudeOAuthRequestBody — Parrot passes it through.
      
        E) tools[-1] cache_control breakpoint
           - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
             the last tool has no cache_control. Client-provided ttl is
             passed through unchanged (repo-wide policy).
      
        F) messages cache_control strategy
           - stripMessageCacheControl removes every client-provided
             messages[*].content[*].cache_control (multi-turn stability).
           - addMessageCacheBreakpoints then injects two stable breakpoints:
             (1) last message, and (2) second-to-last user turn when
             messages.length >= 4.
           - Combined with the system block breakpoint and tools[-1]
             breakpoint, this gives exactly the 4 breakpoints Anthropic
             allows per request.
      
      Non-trivial implementation details to be aware of when rebasing:
      
        * Two new files, no upstream collision:
            gateway_tool_rewrite.go       (D + E algorithms)
            gateway_messages_cache.go     (F strip + breakpoints)
        * Two new feature calls bolted onto the tail of
          applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
          conflicts will be ~10 lines maximum.
        * Response-side injection points all wrap their existing write with
          reverseToolNamesIfPresent(c, ...), preserving original behavior
          when no mapping is stored (static prefix rollback still runs).
        * Non-stream chat/responses switched from c.JSON to
          json.Marshal + c.Data so bytes-level replace is possible.
        * Retry bodies (FilterThinkingBlocksForRetry,
          FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
          only prune blocks — they preserve the already-obfuscated tool
          names, so no extra mapping re-application is needed.
      
      Manual QA: end-to-end scenario verified with 6 tools (above threshold)
      and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
      in test logs; then removed the temp test file.
      
      Tests (16 new):
        - buildDynamicToolMap stability + below-threshold guard
        - sanitizeToolName precedence (dynamic > static)
        - restoreToolNamesInBytes longest-first + static rollback
        - applyToolNameRewriteToBody skips server tools + syncs tool_choice
        - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
        - stripMessageCacheControl + addMessageCacheBreakpoints in the
          1/4/string-content cases + second-to-last user turn selection
        - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
        - fake name shape follows Parrot {prefix}{head3}{i:02d}
      6e12578b