• alfadb's avatar
    fix(gateway): emit Anthropic-standard SSE error events and failover body · 4c474616
    alfadb authored
    
    
    Two follow-ups to PR #2066's failover-wrap fix:
    
    1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded
       as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage`
       probes for `error.message`, `detail`, or top-level `message` only — so
       `handleFailoverExhausted` and downstream passthrough rules saw an empty
       message, losing the EOF root cause in ops logs. Re-encode as the
       Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`.
       (Addresses the inline review comment from copilot-pull-request-reviewer
       on Wei-Shaw/sub2api#2066.)
    
    2. The streaming `event: error` SSE frame for `response_too_large`,
       `stream_read_error`, and `stream_timeout` was non-standard
       (`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect
       `{"type":"error","error":{"type":"...","message":"..."}}` and parse
       `error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to
       take both reason and message, and emit the standard frame so client
       SDKs surface a real diagnostic message instead of a generic stream error.
    
    This does not by itself prevent task interruption on long-stream EOF
    (SSE has no resume; client-side retry remains the only complete fix), but
    it gives both server-side ops logs and client-side error UIs a meaningful
    upstream message so users know the next step is to retry.
    
    Tests updated to assert the new body shape on both branches plus a new
    assertion that `ExtractUpstreamErrorMessage` returns a non-empty string.
    Co-Authored-By: default avatarClaude Opus 4.7 (1M context) <noreply@anthropic.com>
    4c474616
gateway_service.go 316 KB