1. 30 Apr, 2026 2 commits
  2. 29 Apr, 2026 1 commit
    • alfadb's avatar
      fix(gateway): sanitize stream errors to avoid leaking infrastructure topology · d78478e8
      alfadb authored
      (*net.OpError).Error() concatenates Source/Addr fields, so the previous
      disconnectMsg surfaced internal source IP/port and upstream server address
      to clients via SSE error frames and UpstreamFailoverError.ResponseBody
      (reported by @Wei-Shaw on PR #2066).
      
      - Add sanitizeStreamError that maps known errors (io.ErrUnexpectedEOF,
        context.Canceled, syscall.ECONNRESET/EPIPE/ETIMEDOUT/...) to fixed
        descriptions and falls back to a generic placeholder, with an explicit
        *net.OpError branch that drops Source/Addr fields entirely.
      - Use sanitized message in client-facing disconnectMsg; full ev.err is
        still preserved in the existing operator log line for diagnosis.
      - Tests cover net.OpError redaction, the failover ResponseBody path, and
        every known sanitized error mapping.
      d78478e8
  3. 28 Apr, 2026 2 commits
    • alfadb's avatar
      fix(gateway): emit Anthropic-standard SSE error events and failover body · 4c474616
      alfadb authored
      
      
      Two follow-ups to PR #2066's failover-wrap fix:
      
      1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded
         as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage`
         probes for `error.message`, `detail`, or top-level `message` only — so
         `handleFailoverExhausted` and downstream passthrough rules saw an empty
         message, losing the EOF root cause in ops logs. Re-encode as the
         Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`.
         (Addresses the inline review comment from copilot-pull-request-reviewer
         on Wei-Shaw/sub2api#2066.)
      
      2. The streaming `event: error` SSE frame for `response_too_large`,
         `stream_read_error`, and `stream_timeout` was non-standard
         (`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect
         `{"type":"error","error":{"type":"...","message":"..."}}` and parse
         `error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to
         take both reason and message, and emit the standard frame so client
         SDKs surface a real diagnostic message instead of a generic stream error.
      
      This does not by itself prevent task interruption on long-stream EOF
      (SSE has no resume; client-side retry remains the only complete fix), but
      it gives both server-side ops logs and client-side error UIs a meaningful
      upstream message so users know the next step is to retry.
      
      Tests updated to assert the new body shape on both branches plus a new
      assertion that `ExtractUpstreamErrorMessage` returns a non-empty string.
      Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
      4c474616
    • alfadb's avatar
      fix(gateway): wrap Anthropic stream EOF as failover error before client output · 63275735
      alfadb authored
      
      
      Anthropic streaming path (gateway_service.go) returned a plain error on
      upstream SSE read failure, so the handler-level UpstreamFailoverError check
      never fired and the client received a bare `stream_read_error` event,
      breaking long-running tasks even when no bytes had been written yet.
      
      The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge
      backends doing graceful rotation: Go's http.Transport surfaces this as
      `unexpected EOF` and never auto-retries.
      
      Mirror what the OpenAI and antigravity gateways already do: when the read
      error happens before any byte has reached the client (`!c.Writer.Written()`),
      return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}`
      so the handler can retry on the same or another account. After client
      output has begun, SSE has no resume protocol — keep the existing passthrough
      behavior.
      
      Tests cover both branches via streamReadCloser-based fixtures.
      Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
      63275735
  4. 27 Apr, 2026 14 commits
    • 陈曦's avatar
      不记录request_capture_logs表的bug修改 · 065e4782
      陈曦 authored
      065e4782
    • 陈曦's avatar
    • shaw's avatar
      fix(gateway): skip body mimicry for real Claude Code clients to restore prompt caching · 266179a3
      shaw authored and 陈曦's avatar 陈曦 committed
      PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
      accounts, including real Claude Code CLI clients. This replaced the
      client's long system prompt (~10K+ tokens with stable cache_control
      breakpoints) with a short ~45 token [billing, CC prompt] pair, which
      falls below Anthropic's 1024-token minimum cacheable prefix threshold.
      The result: every request created a new cache but never hit an existing
      one.
      
      Fix: restore the Claude Code client detection gate so that real CC
      clients bypass body-level mimicry (system rewrite, message cache
      management, tool name obfuscation). Non-CC third-party clients
      (opencode, etc.) continue to receive full mimicry.
      
      Also harden the detection logic:
      - Make UA regex case-insensitive (align with claude_code_validator.go)
      - Validate metadata.user_id format via ParseMetadataUserID() instead of
        just checking non-empty, preventing third-party tools from spoofing
        a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
      266179a3
    • hungryboy1025's avatar
      fix(openai): tighten responses stream account tests · 95f06794
      hungryboy1025 authored and 陈曦's avatar 陈曦 committed
      95f06794
    • shaw's avatar
      chore(gateway): fix lint issues from cc-mimicry-parity merge · 2fbd2767
      shaw authored and 陈曦's avatar 陈曦 committed
      - staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
        passthrough guard (`!(a && b)` → `a != ... || !b`).
      - unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
        switched both `/v1/messages` and `/count_tokens` paths to unconditional
        `account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
        is kept (still referenced by `TestIsClaudeCodeClient`).
      2fbd2767
    • keh4l's avatar
      fix(gateway): skip client header passthrough on OAuth mimicry path · 2130c54e
      keh4l authored and 陈曦's avatar 陈曦 committed
      Root cause of persistent third-party detection: sub2api's
      buildUpstreamRequest transparently forwards client headers via
      allowedHeaders whitelist (addHeaderRaw) before applying mimicry
      overrides. When third-party clients (opencode, etc.) send their own
      anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
      values, these get appended to the request alongside our injected
      headers, creating an inconsistent header set that Anthropic detects.
      
      Parrot's build_upstream_headers constructs exactly 9 headers from
      scratch and never forwards anything from the client. This is why
      'same opencode version, some users work some don't' — different
      opencode configs/versions send different header combinations.
      
      Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
      client header passthrough loop entirely. The subsequent
      applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
      pipeline constructs all necessary headers from our controlled values.
      
      Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
      now unconditionally rewrite system (even if client already sent a
      Claude Code-style prompt), ensuring billing attribution block is
      always present.
      2130c54e
    • keh4l's avatar
      fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity · abd6f5dc
      keh4l authored and 陈曦's avatar 陈曦 committed
      Before: isClaudeCodeRequest() checked whether the client looks like a
      real Claude Code CLI (UA, system prompt, X-App header, metadata format).
      If it looked like Claude Code, all mimicry was skipped — the assumption
      being that a real CLI needs no help.
      
      Problem: third-party tools like opencode partially impersonate Claude
      Code (sending claude-cli UA + claude-code beta + CC system prompt) but
      miss critical details (billing attribution block, tool-name obfuscation,
      cache breakpoints, full beta set). Some users' opencode instances pass
      the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
      while Anthropic still detects the request as third-party.
      
      This explains why 'same opencode version, some users work, some don't'
      — it depends on which opencode features/config trigger the validator.
      
      Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
      matching Parrot's behavior (Parrot never checks client identity).
      This is safe because our mimicry is strictly more complete than any
      third-party client's partial impersonation.
      
      Changed:
        - /v1/messages path: remove isClaudeCode gate
        - /v1/messages/count_tokens path: same
      abd6f5dc
    • keh4l's avatar
      fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths · 7ae378de
      keh4l authored and 陈曦's avatar 陈曦 committed
      The previous commit only wired stripMessageCacheControl,
      addMessageCacheBreakpoints, and tool-name obfuscation into
      applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
      /responses). The native /v1/messages path and count_tokens path
      have their own independent mimicry code blocks and were missed.
      
      Now all three entry points share the same D/E/F pipeline:
        - /v1/messages (gateway_service.go forwardAnthropic)
        - /v1/messages/count_tokens (gateway_service.go countTokens)
        - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
      7ae378de
    • keh4l's avatar
      feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints · f507bae5
      keh4l authored and 陈曦's avatar 陈曦 committed
      Implements the remaining three parity items with Parrot cc_mimicry:
      
        D) Tool-name obfuscation
           - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
             Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
             Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
             math/rand; byte-exact reproduction is impossible (Python hash vs
             Go hash), but the two invariants that matter are preserved:
               * same input tool_names yield identical mapping (cache hit)
               * prefix pool is shuffled (names look distributed)
           - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
             applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
           - Server tools (web_search_20250305, computer_*, etc.) are NOT
             renamed; only type=='function' and type=='custom' tools are.
           - tool_choice.name is rewritten in sync (only when type=='tool').
           - Response side: bytes-level replace on every SSE chunk / JSON
             body at 6 injection points (standard stream/non-stream,
             passthrough stream/non-stream, chat_completions stream +
             non-stream, responses stream + non-stream). Reverse mapping
             applied longest-fake-name-first to prevent substring conflicts
             (parity with Parrot _restore_tool_names_in_chunk).
           - tool_choice is no longer unconditionally deleted in
             normalizeClaudeOAuthRequestBody — Parrot passes it through.
      
        E) tools[-1] cache_control breakpoint
           - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
             the last tool has no cache_control. Client-provided ttl is
             passed through unchanged (repo-wide policy).
      
        F) messages cache_control strategy
           - stripMessageCacheControl removes every client-provided
             messages[*].content[*].cache_control (multi-turn stability).
           - addMessageCacheBreakpoints then injects two stable breakpoints:
             (1) last message, and (2) second-to-last user turn when
             messages.length >= 4.
           - Combined with the system block breakpoint and tools[-1]
             breakpoint, this gives exactly the 4 breakpoints Anthropic
             allows per request.
      
      Non-trivial implementation details to be aware of when rebasing:
      
        * Two new files, no upstream collision:
            gateway_tool_rewrite.go       (D + E algorithms)
            gateway_messages_cache.go     (F strip + breakpoints)
        * Two new feature calls bolted onto the tail of
          applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
          conflicts will be ~10 lines maximum.
        * Response-side injection points all wrap their existing write with
          reverseToolNamesIfPresent(c, ...), preserving original behavior
          when no mapping is stored (static prefix rollback still runs).
        * Non-stream chat/responses switched from c.JSON to
          json.Marshal + c.Data so bytes-level replace is possible.
        * Retry bodies (FilterThinkingBlocksForRetry,
          FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
          only prune blocks — they preserve the already-obfuscated tool
          names, so no extra mapping re-application is needed.
      
      Manual QA: end-to-end scenario verified with 6 tools (above threshold)
      and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
      in test logs; then removed the temp test file.
      
      Tests (16 new):
        - buildDynamicToolMap stability + below-threshold guard
        - sanitizeToolName precedence (dynamic > static)
        - restoreToolNamesInBytes longest-first + static rollback
        - applyToolNameRewriteToBody skips server tools + syncs tool_choice
        - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
        - stripMessageCacheControl + addMessageCacheBreakpoints in the
          1/4/string-content cases + second-to-last user turn selection
        - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
        - fake name shape follows Parrot {prefix}{head3}{i:02d}
      f507bae5
    • keh4l's avatar
      feat(gateway): align body shape with real Claude Code CLI defaults · 6e12785d
      keh4l authored and 陈曦's avatar 陈曦 committed
      Three field-level alignments in normalizeClaudeOAuthRequestBody to
      match real Claude Code CLI traffic byte-for-byte:
      
        1. temperature: previously deleted unconditionally; now passes
           through client value, defaults to 1 when absent (real CLI
           always sends temperature, default 1).
      
        2. max_tokens: defaults to 128000 when absent (real CLI default).
      
        3. context_management: when thinking.type is enabled/adaptive
           and the client did not provide context_management, inject
           {"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
           to mirror real CLI behavior.
      
      tool_choice removal is unchanged (Claude Code OAuth credentials
      do not allow client-supplied tool_choice).
      
      Tests updated:
        - gateway_body_order_test.go: temperature/max_tokens are now
          expected in output; tool_choice still removed.
        - gateway_prompt_test.go: system array is now 2 blocks
          (billing + cc prompt), assertions adjusted.
        - gateway_anthropic_apikey_passthrough_test.go: same 2-block
          assertion.
      6e12785d
    • keh4l's avatar
      feat(gateway): add billing attribution block with cc_version fingerprint · bebded28
      keh4l authored and 陈曦's avatar 陈曦 committed
      Real Claude Code CLI always sends a 2-block system array:
      
        [0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
        [1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}
      
      Before this commit, sub2api's mimicry path only produced block [1].
      The missing billing block is one of the primary third-party detection
      signals Anthropic uses for Claude-Code-scoped OAuth tokens.
      
      New file gateway_billing_block.go ports the fingerprint algorithm
      (byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
      pick chars at positions [4,7,20] of the first user text, then
      `sha256(SALT + chars + cc_version)[:3]`.
      
        - claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
        - gateway_billing_block.go: computeClaudeCodeFingerprint +
          buildBillingAttributionBlockJSON + extractFirstUserText
        - gateway_service.go: rewriteSystemForNonClaudeCode now emits both
          blocks in order; cch=00000 is filled in later by
          signBillingHeaderCCH in buildUpstreamRequest.
      
      Downstream compat note: syncBillingHeaderVersion's regex
      `cc_version=\d+\.\d+\.\d+` only matches the semver triple,
      leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
      bebded28
    • keh4l's avatar
      feat(claude): add ttl to cache_control with default 5m · 48433683
      keh4l authored and 陈曦's avatar 陈曦 committed
      Real Claude CLI traffic sends cache_control as
      `{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
      sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
      the official CLI and one more third-party detection signal.
      
      Policy: client-provided ttl is always passed through unchanged.
      Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
      to avoid burning the 1h cache budget on automatic breakpoints while
      still aligning with the `ttl` field being present.
      
        - claude/constants.go: DefaultCacheControlTTL = "5m"
        - apicompat/types.go: new AnthropicCacheControl type with TTL field;
          AnthropicTool gains optional CacheControl pointer so the mimicry
          path can attach a cache breakpoint to tools[-1] later.
        - service/gateway_service.go: anthropicCacheControlPayload gains TTL;
          marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
          emit ttl=5m by default.
      48433683
    • keh4l's avatar
      fix(gateway): use full beta list in buildUpstreamRequest mimicry path · 9961ddb8
      keh4l authored and 陈曦's avatar 陈曦 committed
      The previous commit added FullClaudeCodeMimicryBetas() but the two
      call sites in buildUpstreamRequest still hardcoded the old 3-token
      subset. Anthropic now checks the complete set of beta tokens to
      decide if a request qualifies as Claude Code. Wire them up:
      
        - /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
        - /v1/messages/count_tokens mimic path: same + BetaTokenCounting
      
      Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
      9961ddb8
    • keh4l's avatar
      fix(gateway): apply full Claude Code mimicry on /chat/completions and /responses · c5a8cadc
      keh4l authored and 陈曦's avatar 陈曦 committed
      Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
      which prepends the Claude Code banner but leaves the rest of the body
      in its original non-Claude-Code shape. The codebase already admits this
      is insufficient (see the comment on rewriteSystemForNonClaudeCode in
      gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").
      
      Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
      were detected as third-party apps and bled plan quota with:
      
          Third-party apps now draw from your extra usage, not your plan limits.
      
      Fix:
        - apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
          survives the OpenAI->Anthropic->Marshal round trip; without it the
          downstream rewrite has no user_id to work with.
        - service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
          variant of the /v1/messages mimicry pipeline
          (rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
          metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
        - service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
          for the same reason (no ParsedRequest at the call site).
        - ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
          prompt-prepend with the full mimicry pipeline.
        - applyClaudeCodeMimicHeaders: set x-client-request-id per-request
          (real Claude CLI always does); missing/duplicated values are one more
          third-party fingerprint signal.
      
      No change to the native /v1/messages path: it already called the full
      pipeline, we only lift those helpers into a reusable function.
      
      Tests:
        - go build ./... passes
        - go test ./internal/service/... ./internal/pkg/apicompat/... passes
        - lsp_diagnostics clean on all touched files
        - pre-existing failures in internal/config are unrelated (env-sensitive
          tests that also fail on upstream main)
      c5a8cadc
  5. 26 Apr, 2026 1 commit
  6. 25 Apr, 2026 3 commits
    • shaw's avatar
      fix(gateway): skip body mimicry for real Claude Code clients to restore prompt caching · 496469ac
      shaw authored
      PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
      accounts, including real Claude Code CLI clients. This replaced the
      client's long system prompt (~10K+ tokens with stable cache_control
      breakpoints) with a short ~45 token [billing, CC prompt] pair, which
      falls below Anthropic's 1024-token minimum cacheable prefix threshold.
      The result: every request created a new cache but never hit an existing
      one.
      
      Fix: restore the Claude Code client detection gate so that real CC
      clients bypass body-level mimicry (system rewrite, message cache
      management, tool name obfuscation). Non-CC third-party clients
      (opencode, etc.) continue to receive full mimicry.
      
      Also harden the detection logic:
      - Make UA regex case-insensitive (align with claude_code_validator.go)
      - Validate metadata.user_id format via ParseMetadataUserID() instead of
        just checking non-empty, preventing third-party tools from spoofing
        a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
      496469ac
    • hungryboy1025's avatar
      8987e0ba
    • shaw's avatar
      chore(gateway): fix lint issues from cc-mimicry-parity merge · 732d6495
      shaw authored
      - staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
        passthrough guard (`!(a && b)` → `a != ... || !b`).
      - unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
        switched both `/v1/messages` and `/count_tokens` paths to unconditional
        `account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
        is kept (still referenced by `TestIsClaudeCodeClient`).
      732d6495
  7. 24 Apr, 2026 10 commits
    • keh4l's avatar
      fix(gateway): skip client header passthrough on OAuth mimicry path · bdbd2916
      keh4l authored
      Root cause of persistent third-party detection: sub2api's
      buildUpstreamRequest transparently forwards client headers via
      allowedHeaders whitelist (addHeaderRaw) before applying mimicry
      overrides. When third-party clients (opencode, etc.) send their own
      anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
      values, these get appended to the request alongside our injected
      headers, creating an inconsistent header set that Anthropic detects.
      
      Parrot's build_upstream_headers constructs exactly 9 headers from
      scratch and never forwards anything from the client. This is why
      'same opencode version, some users work some don't' — different
      opencode configs/versions send different header combinations.
      
      Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
      client header passthrough loop entirely. The subsequent
      applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
      pipeline constructs all necessary headers from our controlled values.
      
      Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
      now unconditionally rewrite system (even if client already sent a
      Claude Code-style prompt), ensuring billing attribution block is
      always present.
      bdbd2916
    • keh4l's avatar
      fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity · 6dc89765
      keh4l authored
      Before: isClaudeCodeRequest() checked whether the client looks like a
      real Claude Code CLI (UA, system prompt, X-App header, metadata format).
      If it looked like Claude Code, all mimicry was skipped — the assumption
      being that a real CLI needs no help.
      
      Problem: third-party tools like opencode partially impersonate Claude
      Code (sending claude-cli UA + claude-code beta + CC system prompt) but
      miss critical details (billing attribution block, tool-name obfuscation,
      cache breakpoints, full beta set). Some users' opencode instances pass
      the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
      while Anthropic still detects the request as third-party.
      
      This explains why 'same opencode version, some users work, some don't'
      — it depends on which opencode features/config trigger the validator.
      
      Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
      matching Parrot's behavior (Parrot never checks client identity).
      This is safe because our mimicry is strictly more complete than any
      third-party client's partial impersonation.
      
      Changed:
        - /v1/messages path: remove isClaudeCode gate
        - /v1/messages/count_tokens path: same
      6dc89765
    • keh4l's avatar
      fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths · f3233db0
      keh4l authored
      The previous commit only wired stripMessageCacheControl,
      addMessageCacheBreakpoints, and tool-name obfuscation into
      applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
      /responses). The native /v1/messages path and count_tokens path
      have their own independent mimicry code blocks and were missed.
      
      Now all three entry points share the same D/E/F pipeline:
        - /v1/messages (gateway_service.go forwardAnthropic)
        - /v1/messages/count_tokens (gateway_service.go countTokens)
        - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
      f3233db0
    • keh4l's avatar
      feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints · 6e12578b
      keh4l authored
      Implements the remaining three parity items with Parrot cc_mimicry:
      
        D) Tool-name obfuscation
           - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
             Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
             Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
             math/rand; byte-exact reproduction is impossible (Python hash vs
             Go hash), but the two invariants that matter are preserved:
               * same input tool_names yield identical mapping (cache hit)
               * prefix pool is shuffled (names look distributed)
           - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
             applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
           - Server tools (web_search_20250305, computer_*, etc.) are NOT
             renamed; only type=='function' and type=='custom' tools are.
           - tool_choice.name is rewritten in sync (only when type=='tool').
           - Response side: bytes-level replace on every SSE chunk / JSON
             body at 6 injection points (standard stream/non-stream,
             passthrough stream/non-stream, chat_completions stream +
             non-stream, responses stream + non-stream). Reverse mapping
             applied longest-fake-name-first to prevent substring conflicts
             (parity with Parrot _restore_tool_names_in_chunk).
           - tool_choice is no longer unconditionally deleted in
             normalizeClaudeOAuthRequestBody — Parrot passes it through.
      
        E) tools[-1] cache_control breakpoint
           - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
             the last tool has no cache_control. Client-provided ttl is
             passed through unchanged (repo-wide policy).
      
        F) messages cache_control strategy
           - stripMessageCacheControl removes every client-provided
             messages[*].content[*].cache_control (multi-turn stability).
           - addMessageCacheBreakpoints then injects two stable breakpoints:
             (1) last message, and (2) second-to-last user turn when
             messages.length >= 4.
           - Combined with the system block breakpoint and tools[-1]
             breakpoint, this gives exactly the 4 breakpoints Anthropic
             allows per request.
      
      Non-trivial implementation details to be aware of when rebasing:
      
        * Two new files, no upstream collision:
            gateway_tool_rewrite.go       (D + E algorithms)
            gateway_messages_cache.go     (F strip + breakpoints)
        * Two new feature calls bolted onto the tail of
          applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
          conflicts will be ~10 lines maximum.
        * Response-side injection points all wrap their existing write with
          reverseToolNamesIfPresent(c, ...), preserving original behavior
          when no mapping is stored (static prefix rollback still runs).
        * Non-stream chat/responses switched from c.JSON to
          json.Marshal + c.Data so bytes-level replace is possible.
        * Retry bodies (FilterThinkingBlocksForRetry,
          FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
          only prune blocks — they preserve the already-obfuscated tool
          names, so no extra mapping re-application is needed.
      
      Manual QA: end-to-end scenario verified with 6 tools (above threshold)
      and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
      in test logs; then removed the temp test file.
      
      Tests (16 new):
        - buildDynamicToolMap stability + below-threshold guard
        - sanitizeToolName precedence (dynamic > static)
        - restoreToolNamesInBytes longest-first + static rollback
        - applyToolNameRewriteToBody skips server tools + syncs tool_choice
        - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
        - stripMessageCacheControl + addMessageCacheBreakpoints in the
          1/4/string-content cases + second-to-last user turn selection
        - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
        - fake name shape follows Parrot {prefix}{head3}{i:02d}
      6e12578b
    • keh4l's avatar
      feat(gateway): align body shape with real Claude Code CLI defaults · a25faeca
      keh4l authored
      Three field-level alignments in normalizeClaudeOAuthRequestBody to
      match real Claude Code CLI traffic byte-for-byte:
      
        1. temperature: previously deleted unconditionally; now passes
           through client value, defaults to 1 when absent (real CLI
           always sends temperature, default 1).
      
        2. max_tokens: defaults to 128000 when absent (real CLI default).
      
        3. context_management: when thinking.type is enabled/adaptive
           and the client did not provide context_management, inject
           {"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
           to mirror real CLI behavior.
      
      tool_choice removal is unchanged (Claude Code OAuth credentials
      do not allow client-supplied tool_choice).
      
      Tests updated:
        - gateway_body_order_test.go: temperature/max_tokens are now
          expected in output; tool_choice still removed.
        - gateway_prompt_test.go: system array is now 2 blocks
          (billing + cc prompt), assertions adjusted.
        - gateway_anthropic_apikey_passthrough_test.go: same 2-block
          assertion.
      a25faeca
    • keh4l's avatar
      feat(gateway): add billing attribution block with cc_version fingerprint · 5862e2d8
      keh4l authored
      Real Claude Code CLI always sends a 2-block system array:
      
        [0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
        [1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}
      
      Before this commit, sub2api's mimicry path only produced block [1].
      The missing billing block is one of the primary third-party detection
      signals Anthropic uses for Claude-Code-scoped OAuth tokens.
      
      New file gateway_billing_block.go ports the fingerprint algorithm
      (byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
      pick chars at positions [4,7,20] of the first user text, then
      `sha256(SALT + chars + cc_version)[:3]`.
      
        - claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
        - gateway_billing_block.go: computeClaudeCodeFingerprint +
          buildBillingAttributionBlockJSON + extractFirstUserText
        - gateway_service.go: rewriteSystemForNonClaudeCode now emits both
          blocks in order; cch=00000 is filled in later by
          signBillingHeaderCCH in buildUpstreamRequest.
      
      Downstream compat note: syncBillingHeaderVersion's regex
      `cc_version=\d+\.\d+\.\d+` only matches the semver triple,
      leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
      5862e2d8
    • keh4l's avatar
      feat(claude): add ttl to cache_control with default 5m · 66d64545
      keh4l authored
      Real Claude CLI traffic sends cache_control as
      `{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
      sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
      the official CLI and one more third-party detection signal.
      
      Policy: client-provided ttl is always passed through unchanged.
      Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
      to avoid burning the 1h cache budget on automatic breakpoints while
      still aligning with the `ttl` field being present.
      
        - claude/constants.go: DefaultCacheControlTTL = "5m"
        - apicompat/types.go: new AnthropicCacheControl type with TTL field;
          AnthropicTool gains optional CacheControl pointer so the mimicry
          path can attach a cache breakpoint to tools[-1] later.
        - service/gateway_service.go: anthropicCacheControlPayload gains TTL;
          marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
          emit ttl=5m by default.
      66d64545
    • keh4l's avatar
      fix(gateway): use full beta list in buildUpstreamRequest mimicry path · 165553cf
      keh4l authored
      The previous commit added FullClaudeCodeMimicryBetas() but the two
      call sites in buildUpstreamRequest still hardcoded the old 3-token
      subset. Anthropic now checks the complete set of beta tokens to
      decide if a request qualifies as Claude Code. Wire them up:
      
        - /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
        - /v1/messages/count_tokens mimic path: same + BetaTokenCounting
      
      Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
      165553cf
    • keh4l's avatar
      fix(gateway): apply full Claude Code mimicry on /chat/completions and /responses · b5467d61
      keh4l authored
      Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
      which prepends the Claude Code banner but leaves the rest of the body
      in its original non-Claude-Code shape. The codebase already admits this
      is insufficient (see the comment on rewriteSystemForNonClaudeCode in
      gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").
      
      Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
      were detected as third-party apps and bled plan quota with:
      
          Third-party apps now draw from your extra usage, not your plan limits.
      
      Fix:
        - apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
          survives the OpenAI->Anthropic->Marshal round trip; without it the
          downstream rewrite has no user_id to work with.
        - service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
          variant of the /v1/messages mimicry pipeline
          (rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
          metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
        - service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
          for the same reason (no ParsedRequest at the call site).
        - ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
          prompt-prepend with the full mimicry pipeline.
        - applyClaudeCodeMimicHeaders: set x-client-request-id per-request
          (real Claude CLI always does); missing/duplicated values are one more
          third-party fingerprint signal.
      
      No change to the native /v1/messages path: it already called the full
      pipeline, we only lift those helpers into a reusable function.
      
      Tests:
        - go build ./... passes
        - go test ./internal/service/... ./internal/pkg/apicompat/... passes
        - lsp_diagnostics clean on all touched files
        - pre-existing failures in internal/config are unrelated (env-sensitive
          tests that also fail on upstream main)
      b5467d61
    • 陈曦's avatar
      收集req和resp的相关更改 · 994da655
      陈曦 authored
      994da655
  8. 19 Apr, 2026 1 commit
  9. 17 Apr, 2026 2 commits
    • erio's avatar
      fix(usage): subscription billing honours group rate multiplier · 44cdef79
      erio authored
      Subscription-mode billing was consuming quota at TotalCost (raw) instead of
      ActualCost (TotalCost * RateMultiplier), so per-group rate multipliers —
      including free subscriptions (multiplier = 0) — were silently ignored.
      Switch the three subscription cost writes in buildUsageBillingCommand,
      finalizePostUsageBilling, and the legacy postUsageBilling fallback to
      ActualCost, and add a table-driven test covering 2x / 0.5x / free multipliers
      plus a balance-mode regression check.
      44cdef79
    • erio's avatar
      refactor: extract ReadUpstreamResponseBody to deduplicate upstream response... · c0b2cacb
      erio authored and 陈曦's avatar 陈曦 committed
      refactor: extract ReadUpstreamResponseBody to deduplicate upstream response read + too-large error handling
      
      Consolidates 9 call sites of resolveUpstreamResponseReadLimit + readUpstreamResponseBodyLimited + ErrUpstreamResponseBodyTooLarge error handling into a single ReadUpstreamResponseBody function with TooLargeWriter callback for API-format-specific error responses (Anthropic, OpenAI, countTokens).
      c0b2cacb
  10. 15 Apr, 2026 1 commit
    • erio's avatar
      refactor: extract ReadUpstreamResponseBody to deduplicate upstream response... · 10699eeb
      erio authored
      refactor: extract ReadUpstreamResponseBody to deduplicate upstream response read + too-large error handling
      
      Consolidates 9 call sites of resolveUpstreamResponseReadLimit + readUpstreamResponseBodyLimited + ErrUpstreamResponseBodyTooLarge error handling into a single ReadUpstreamResponseBody function with TooLargeWriter callback for API-format-specific error responses (Anthropic, OpenAI, countTokens).
      10699eeb
  11. 14 Apr, 2026 3 commits
    • erio's avatar
      fix: round-2 audit fixes — security, code quality, and UI improvements · a9880ee7
      erio authored
      Security (HIGH):
      - Normalize all Redis cache keys to lowercase (verifyCode, passwordReset)
      - Fix verify code TTL renewal on failed attempts: use remaining TTL via
        ExpiresAt field instead of resetting to full 15-minute window
      - Add 3 missing fields to diffSettings audit log (promo_code, invitation_code,
        custom_endpoints)
      
      Code quality (MEDIUM):
      - Extract filterVerifiedEmails shared helper (balance_notify_service.go)
      - Add Pricing array non-empty validation for channel pricing rules
      - Add platform token semantics comment in gateway_service.go
      - Complete validatePlanPatch test coverage (+10 test cases)
      - Replace string types with QuotaThresholdType/QuotaResetMode across frontend
      - Remove duplicate getPlatformTextColor/getRateBadgeClass in ChannelsView
      - Return EMAIL_NOT_FOUND error on RemoveNotifyEmail miss
      
      UI improvements:
      - Reorder cost tooltip: user billing above separator, account billing below
      - Add NaN guard to accountBilled function
      - Move timezone selector inline into reset-mode row (no longer standalone)
      a9880ee7
    • erio's avatar
      fix: correct account stats pricing priority order · 98c9d517
      erio authored
      Priority was wrong:
      - Before: custom rules → LiteLLM (when ApplyPricingToAccountStats) → nil
      - After:  custom rules → totalCost (when ApplyPricingToAccountStats) → LiteLLM → nil
      
      When ApplyPricingToAccountStats is enabled, use the request's actual
      client billing cost (before multiplier) as account_stats_cost, instead
      of recalculating from LiteLLM per-token prices which produced incorrect
      values for per-request billing mode.
      
      LiteLLM model pricing is now the final fallback (priority 3), used only
      when neither custom rules nor ApplyPricingToAccountStats apply.
      98c9d517
    • erio's avatar
      feat: WebSearch tri-state, account stats pricing fix, quota cache fix, usage tooltip · 1262654d
      erio authored
      WebSearch tri-state switch:
      - Account-level web_search_emulation changed from bool to tri-state
        string: "default" (follow channel) / "enabled" / "disabled"
      - shouldEmulateWebSearch checks channel config when account is "default"
      - SQL migration converts old bool values
      - Frontend select replaces toggle in Edit/CreateAccountModal
      
      Account stats pricing:
      - resolveAccountStatsCost uses upstream model (post-mapping) for matching
      - Priority: custom rules → model pricing file (when toggle on) → default
      - Custom rules always configurable, independent of toggle
      - Account ID field changed to searchable selector filtered by platform
      - Description updated to reflect new behavior
      
      Quota notification cache fix:
      - CheckAccountQuotaAfterIncrement fetches real-time account from DB
      - Reconstructs pre-increment usage for accurate threshold crossing detection
      - New AccountQuotaReader interface (minimal: GetByID only)
      
      Usage tooltip:
      - Per-request/image billing shows per-request price instead of $0 token price
      - Token billing continues to show input/output price per million tokens
      1262654d