1. 08 Feb, 2026 21 commits
  2. 07 Feb, 2026 19 commits
    • erio's avatar
      fix(gateway): restore upstream account forwarding with dedicated methods · 77b66653
      erio authored
      v0.1.74 merged upstream accounts into the OAuth path, causing requests
      to hit the wrong protocol and endpoint. Add three upstream-specific
      methods (testUpstreamConnection, ForwardUpstream, ForwardUpstreamGemini)
      that use base_url + apiKey auth and passthrough the original body, while
      reusing the existing response handling and error/retry logic.
      77b66653
    • erio's avatar
      feat: smart retry max 1 attempt + clear sticky session on failure · 3077fd27
      erio authored
      - Change antigravitySmartRetryMaxAttempts from 3 to 1 to prevent
        repeated rate limiting and long waits
      - Clear sticky session binding (DeleteSessionAccountID) after smart
        retry exhaustion, so subsequent requests don't hit the same
        rate-limited account
      - Add flow diagrams to Forward/ForwardGemini doc comments
      - Add comprehensive unit tests covering:
        - Sticky session cleared on retry failure (429, 503, network error)
        - Sticky session NOT cleared on retry success
        - Sticky session NOT cleared for non-sticky requests (empty hash)
        - Sticky session NOT cleared on long delay path (handled by handler)
        - Nil cache safety (no panic)
        - MaxAttempts constant verification
        - End-to-end retryLoop → switchError propagation with session clear
      3077fd27
    • shaw's avatar
      6aaa4aee
    • erio's avatar
      e3748da8
    • erio's avatar
      refactor: remove Anthropic digest chain from Messages handler · 86b503f8
      erio authored
      The digest chain fallback is only needed for Gemini endpoints, not
      for the Anthropic Messages API path. Remove the handler integration
      while keeping the reusable service/repository layer for future use.
      86b503f8
    • erio's avatar
      feat: add Anthropic sticky session digest chain matching via Trie · 50a783ff
      erio authored
      The previous fallback (step 3) in GenerateSessionHash hashed system +
      all messages together, producing a different hash each round as the
      conversation grew ([a] -> [a,b] -> [a,b,c]). This made fallback sticky
      sessions ineffective for multi-turn conversations.
      
      Implement per-message Trie digest chain matching (reusing Gemini's Trie
      infrastructure) so that the previous round's chain is always a prefix
      of the current round's chain, enabling reliable session affinity.
      50a783ff
    • shaw's avatar
      fix(gateway): harden digest logging and align antigravity ops · 1439eb39
      shaw authored
      - avoid panic by using safe UUID prefix truncation in Gemini digest fallback logs\n- remove unconditional Antigravity 429 full-body debug logs and honor log truncation config\n- align Antigravity quick preset mappings to opus 4.6-thinking targets only\n- restore scope rate-limit aggregation/output in ops availability stats
      1439eb39
    • erio's avatar
      refactor: simplify sticky session rate limit handling — switch immediately on any rate limit · e1a68497
      erio authored
      Remove threshold-based waiting in both sticky session and antigravity
      pre-check paths. When a model is rate-limited, immediately clear the
      sticky session and switch accounts instead of waiting for short durations.
      e1a68497
    • erio's avatar
    • erio's avatar
      fix(antigravity): fetch default mapping from API and sync Redis on rate limit · 2656320d
      erio authored
      1. Frontend: replace hardcoded antigravityDefaultMappings with async
         fetch from GET /admin/accounts/antigravity/default-model-mapping,
         eliminating the duplicate data source that caused frontend/backend
         mapping inconsistency.
      
      2. Backend: convert handleSmartRetry and antigravityRetryLoop from
         standalone functions to AntigravityGatewayService methods, enabling
         Redis cache sync (updateAccountModelRateLimitInCache) after both
         rate-limit write paths — long-delay branch and retry-exhausted branch.
      2656320d
    • erio's avatar
      style: fix gofmt formatting in gateway_service.go · b4f6c4f9
      erio authored
      Remove extra blank line that caused golangci-lint gofmt check to fail.
      b4f6c4f9
    • erio's avatar
    • erio's avatar
      test(antigravity): add missing unit tests for upstream and custom model_mapping · 386126b1
      erio authored
      - Add GetAccessToken upstream branch tests (success/failure/empty/nil)
      - Add mapAntigravityModel wildcard-target-equals-request edge case tests
      - Add upstream account smart retry test case
      - Add GeminiMessagesCompatService custom model_mapping and empty model tests
      386126b1
    • erio's avatar
      fix(antigravity): support upstream accounts and custom model_mapping in scheduling · de092728
      erio authored
      - GetAccessToken: add upstream branch to read api_key from credentials
      - shouldTriggerAntigravitySmartRetry: relax check from IsOAuth to Platform-based
      - isModelSupportedByAccount/WithContext: replace IsAntigravityModelSupported
        whitelist with mapAntigravityModel for unified scheduling/forwarding logic
      - mapAntigravityModel: fix edge case where wildcard target equals request model
      - Update tests for new behavior and add custom model_mapping test cases
      de092728
    • erio's avatar
      edb09370
    • erio's avatar
      43a4840d
    • erio's avatar
      feat(antigravity): comprehensive enhancements - model mapping, rate limiting, scheduling & ops · 5e98445b
      erio authored
      Key changes:
      - Upgrade model mapping: Opus 4.5 → Opus 4.6-thinking with precise matching
      - Unified rate limiting: scope-level → model-level with Redis snapshot sync
      - Load-balanced scheduling by call count with smart retry mechanism
      - Force cache billing support
      - Model identity injection in prompts with leak prevention
      - Thinking mode auto-handling (max_tokens/budget_tokens fix)
      - Frontend: whitelist mode toggle, model mapping validation, status indicators
      - Gemini session fallback with Redis Trie O(L) matching
      - Ops: enhanced concurrency monitoring, account availability, retry logic
      - Migration scripts: 049-051 for model mapping unification
      5e98445b
    • erio's avatar
      fix(antigravity): reduce 429 fallback cooldown from 5min to 30s · 8917afab
      erio authored
      The default fallback cooldown when rate limit reset time cannot be
      parsed was 5 minutes, which is too aggressive and causes accounts
      to be unnecessarily locked out. Reduce to 30 seconds for faster
      recovery. Config override still works (unit remains minutes).
      8917afab
    • erio's avatar
      fix(antigravity): auto-fix max_tokens <= budget_tokens causing 400 error · 49233ec2
      erio authored
      When extended thinking is enabled, Claude API requires max_tokens >
      thinking.budget_tokens. If misconfigured, this auto-adjusts max_tokens
      to budget_tokens + 1000 instead of returning a 400 error.
      
      - Add ensureMaxTokensGreaterThanBudget helper function
      - Extract Gemini25FlashThinkingBudgetLimit constant (24576)
      - Log adjustment for debugging
      49233ec2