Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in / Register
Toggle navigation
Menu
Open sidebar
陈曦
sub2api
Commits
27214f86
Unverified
Commit
27214f86
authored
Jan 15, 2026
by
Wesley Liddick
Committed by
GitHub
Jan 15, 2026
Browse files
Merge pull request #285 from IanShaw027/fix/ops-bug
feat(ops): 增强错误日志管理、告警静默和前端 UI 优化
parents
28de614d
5354ba36
Changes
42
Hide whitespace changes
Inline
Side-by-side
PR_DESCRIPTION.md
0 → 100644
View file @
27214f86
## 概述
全面增强运维监控系统(Ops)的错误日志管理和告警静默功能,优化前端 UI 组件代码质量和用户体验。本次更新重构了核心服务层和数据访问层,提升系统可维护性和运维效率。
## 主要改动
### 1. 错误日志查询优化
**功能特性:**
-
新增 GetErrorLogByID 接口,支持按 ID 精确查询错误详情
-
优化错误日志过滤逻辑,支持多维度筛选(平台、阶段、来源、所有者等)
-
改进查询参数处理,简化代码结构
-
增强错误分类和标准化处理
-
支持错误解决状态追踪(resolved 字段)
**技术实现:**
-
`ops_handler.go`
- 新增单条错误日志查询接口
-
`ops_repo.go`
- 优化数据查询和过滤条件构建
-
`ops_models.go`
- 扩展错误日志数据模型
-
前端 API 接口同步更新
### 2. 告警静默功能
**功能特性:**
-
支持按规则、平台、分组、区域等维度静默告警
-
可设置静默时长和原因说明
-
静默记录可追溯,记录创建人和创建时间
-
自动过期机制,避免永久静默
**技术实现:**
-
`037_ops_alert_silences.sql`
- 新增告警静默表
-
`ops_alerts.go`
- 告警静默逻辑实现
-
`ops_alerts_handler.go`
- 告警静默 API 接口
-
`OpsAlertEventsCard.vue`
- 前端告警静默操作界面
**数据库结构:**
| 字段 | 类型 | 说明 |
|------|------|------|
| rule_id | BIGINT | 告警规则 ID |
| platform | VARCHAR(64) | 平台标识 |
| group_id | BIGINT | 分组 ID(可选) |
| region | VARCHAR(64) | 区域(可选) |
| until | TIMESTAMPTZ | 静默截止时间 |
| reason | TEXT | 静默原因 |
| created_by | BIGINT | 创建人 ID |
### 3. 错误分类标准化
**功能特性:**
-
统一错误阶段分类(request|auth|routing|upstream|network|internal)
-
规范错误归属分类(client|provider|platform)
-
标准化错误来源分类(client_request|upstream_http|gateway)
-
自动迁移历史数据到新分类体系
**技术实现:**
-
`038_ops_errors_resolution_retry_results_and_standardize_classification.sql`
- 分类标准化迁移
-
自动映射历史遗留分类到新标准
-
自动解决已恢复的上游错误(客户端状态码 < 400)
### 4. Gateway 服务集成
**功能特性:**
-
完善各 Gateway 服务的 Ops 集成
-
统一错误日志记录接口
-
增强上游错误追踪能力
**涉及服务:**
-
`antigravity_gateway_service.go`
- Antigravity 网关集成
-
`gateway_service.go`
- 通用网关集成
-
`gemini_messages_compat_service.go`
- Gemini 兼容层集成
-
`openai_gateway_service.go`
- OpenAI 网关集成
### 5. 前端 UI 优化
**代码重构:**
-
大幅简化错误详情模态框代码(从 828 行优化到 450 行)
-
优化错误日志表格组件,提升可读性
-
清理未使用的 i18n 翻译,减少冗余
-
统一组件代码风格和格式
-
优化骨架屏组件,更好匹配实际看板布局
**布局改进:**
-
修复模态框内容溢出和滚动问题
-
优化表格布局,使用 flex 布局确保正确显示
-
改进看板头部布局和交互
-
提升响应式体验
-
骨架屏支持全屏模式适配
**交互优化:**
-
优化告警事件卡片功能和展示
-
改进错误详情展示逻辑
-
增强请求详情模态框
-
完善运行时设置卡片
-
改进加载动画效果
### 6. 国际化完善
**文案补充:**
-
补充错误日志相关的英文翻译
-
添加告警静默功能的中英文文案
-
完善提示文本和错误信息
-
统一术语翻译标准
## 文件变更
**后端(26 个文件):**
-
`backend/internal/handler/admin/ops_alerts_handler.go`
- 告警接口增强
-
`backend/internal/handler/admin/ops_handler.go`
- 错误日志接口优化
-
`backend/internal/handler/ops_error_logger.go`
- 错误记录器增强
-
`backend/internal/repository/ops_repo.go`
- 数据访问层重构
-
`backend/internal/repository/ops_repo_alerts.go`
- 告警数据访问增强
-
`backend/internal/service/ops_*.go`
- 核心服务层重构(10 个文件)
-
`backend/internal/service/*_gateway_service.go`
- Gateway 集成(4 个文件)
-
`backend/internal/server/routes/admin.go`
- 路由配置更新
-
`backend/migrations/*.sql`
- 数据库迁移(2 个文件)
-
测试文件更新(5 个文件)
**前端(13 个文件):**
-
`frontend/src/views/admin/ops/OpsDashboard.vue`
- 看板主页优化
-
`frontend/src/views/admin/ops/components/*.vue`
- 组件重构(10 个文件)
-
`frontend/src/api/admin/ops.ts`
- API 接口扩展
-
`frontend/src/i18n/locales/*.ts`
- 国际化文本(2 个文件)
## 代码统计
-
44 个文件修改
-
3733 行新增
-
995 行删除
-
净增加 2738 行
## 核心改进
**可维护性提升:**
-
重构核心服务层,职责更清晰
-
简化前端组件代码,降低复杂度
-
统一代码风格和命名规范
-
清理冗余代码和未使用的翻译
-
标准化错误分类体系
**功能完善:**
-
告警静默功能,减少告警噪音
-
错误日志查询优化,提升运维效率
-
Gateway 服务集成完善,统一监控能力
-
错误解决状态追踪,便于问题管理
**用户体验优化:**
-
修复多个 UI 布局问题
-
优化交互流程
-
完善国际化支持
-
提升响应式体验
-
改进加载状态展示
## 测试验证
-
✅ 错误日志查询和过滤功能
-
✅ 告警静默创建和自动过期
-
✅ 错误分类标准化迁移
-
✅ Gateway 服务错误日志记录
-
✅ 前端组件布局和交互
-
✅ 骨架屏全屏模式适配
-
✅ 国际化文本完整性
-
✅ API 接口功能正确性
-
✅ 数据库迁移执行成功
backend/internal/handler/admin/ops_alerts_handler.go
View file @
27214f86
...
@@ -7,8 +7,10 @@ import (
...
@@ -7,8 +7,10 @@ import (
"net/http"
"net/http"
"strconv"
"strconv"
"strings"
"strings"
"time"
"github.com/Wei-Shaw/sub2api/internal/pkg/response"
"github.com/Wei-Shaw/sub2api/internal/pkg/response"
"github.com/Wei-Shaw/sub2api/internal/server/middleware"
"github.com/Wei-Shaw/sub2api/internal/service"
"github.com/Wei-Shaw/sub2api/internal/service"
"github.com/gin-gonic/gin"
"github.com/gin-gonic/gin"
"github.com/gin-gonic/gin/binding"
"github.com/gin-gonic/gin/binding"
...
@@ -18,8 +20,6 @@ var validOpsAlertMetricTypes = []string{
...
@@ -18,8 +20,6 @@ var validOpsAlertMetricTypes = []string{
"success_rate"
,
"success_rate"
,
"error_rate"
,
"error_rate"
,
"upstream_error_rate"
,
"upstream_error_rate"
,
"p95_latency_ms"
,
"p99_latency_ms"
,
"cpu_usage_percent"
,
"cpu_usage_percent"
,
"memory_usage_percent"
,
"memory_usage_percent"
,
"concurrency_queue_depth"
,
"concurrency_queue_depth"
,
...
@@ -372,8 +372,135 @@ func (h *OpsHandler) DeleteAlertRule(c *gin.Context) {
...
@@ -372,8 +372,135 @@ func (h *OpsHandler) DeleteAlertRule(c *gin.Context) {
response
.
Success
(
c
,
gin
.
H
{
"deleted"
:
true
})
response
.
Success
(
c
,
gin
.
H
{
"deleted"
:
true
})
}
}
// GetAlertEvent returns a single ops alert event.
// GET /api/v1/admin/ops/alert-events/:id
func
(
h
*
OpsHandler
)
GetAlertEvent
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
id
,
err
:=
strconv
.
ParseInt
(
c
.
Param
(
"id"
),
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid event ID"
)
return
}
ev
,
err
:=
h
.
opsService
.
GetAlertEventByID
(
c
.
Request
.
Context
(),
id
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
ev
)
}
// UpdateAlertEventStatus updates an ops alert event status.
// PUT /api/v1/admin/ops/alert-events/:id/status
func
(
h
*
OpsHandler
)
UpdateAlertEventStatus
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
id
,
err
:=
strconv
.
ParseInt
(
c
.
Param
(
"id"
),
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid event ID"
)
return
}
var
payload
struct
{
Status
string
`json:"status"`
}
if
err
:=
c
.
ShouldBindJSON
(
&
payload
);
err
!=
nil
{
response
.
BadRequest
(
c
,
"Invalid request body"
)
return
}
payload
.
Status
=
strings
.
TrimSpace
(
payload
.
Status
)
if
payload
.
Status
==
""
{
response
.
BadRequest
(
c
,
"Invalid status"
)
return
}
if
payload
.
Status
!=
service
.
OpsAlertStatusResolved
&&
payload
.
Status
!=
service
.
OpsAlertStatusManualResolved
{
response
.
BadRequest
(
c
,
"Invalid status"
)
return
}
var
resolvedAt
*
time
.
Time
if
payload
.
Status
==
service
.
OpsAlertStatusResolved
||
payload
.
Status
==
service
.
OpsAlertStatusManualResolved
{
now
:=
time
.
Now
()
.
UTC
()
resolvedAt
=
&
now
}
if
err
:=
h
.
opsService
.
UpdateAlertEventStatus
(
c
.
Request
.
Context
(),
id
,
payload
.
Status
,
resolvedAt
);
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
gin
.
H
{
"updated"
:
true
})
}
// ListAlertEvents lists recent ops alert events.
// ListAlertEvents lists recent ops alert events.
// GET /api/v1/admin/ops/alert-events
// GET /api/v1/admin/ops/alert-events
// CreateAlertSilence creates a scoped silence for ops alerts.
// POST /api/v1/admin/ops/alert-silences
func
(
h
*
OpsHandler
)
CreateAlertSilence
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
var
payload
struct
{
RuleID
int64
`json:"rule_id"`
Platform
string
`json:"platform"`
GroupID
*
int64
`json:"group_id"`
Region
*
string
`json:"region"`
Until
string
`json:"until"`
Reason
string
`json:"reason"`
}
if
err
:=
c
.
ShouldBindJSON
(
&
payload
);
err
!=
nil
{
response
.
BadRequest
(
c
,
"Invalid request body"
)
return
}
until
,
err
:=
time
.
Parse
(
time
.
RFC3339
,
strings
.
TrimSpace
(
payload
.
Until
))
if
err
!=
nil
{
response
.
BadRequest
(
c
,
"Invalid until"
)
return
}
createdBy
:=
(
*
int64
)(
nil
)
if
subject
,
ok
:=
middleware
.
GetAuthSubjectFromContext
(
c
);
ok
{
uid
:=
subject
.
UserID
createdBy
=
&
uid
}
silence
:=
&
service
.
OpsAlertSilence
{
RuleID
:
payload
.
RuleID
,
Platform
:
strings
.
TrimSpace
(
payload
.
Platform
),
GroupID
:
payload
.
GroupID
,
Region
:
payload
.
Region
,
Until
:
until
,
Reason
:
strings
.
TrimSpace
(
payload
.
Reason
),
CreatedBy
:
createdBy
,
}
created
,
err
:=
h
.
opsService
.
CreateAlertSilence
(
c
.
Request
.
Context
(),
silence
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
created
)
}
func
(
h
*
OpsHandler
)
ListAlertEvents
(
c
*
gin
.
Context
)
{
func
(
h
*
OpsHandler
)
ListAlertEvents
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
...
@@ -384,7 +511,7 @@ func (h *OpsHandler) ListAlertEvents(c *gin.Context) {
...
@@ -384,7 +511,7 @@ func (h *OpsHandler) ListAlertEvents(c *gin.Context) {
return
return
}
}
limit
:=
10
0
limit
:=
2
0
if
raw
:=
strings
.
TrimSpace
(
c
.
Query
(
"limit"
));
raw
!=
""
{
if
raw
:=
strings
.
TrimSpace
(
c
.
Query
(
"limit"
));
raw
!=
""
{
n
,
err
:=
strconv
.
Atoi
(
raw
)
n
,
err
:=
strconv
.
Atoi
(
raw
)
if
err
!=
nil
||
n
<=
0
{
if
err
!=
nil
||
n
<=
0
{
...
@@ -400,6 +527,49 @@ func (h *OpsHandler) ListAlertEvents(c *gin.Context) {
...
@@ -400,6 +527,49 @@ func (h *OpsHandler) ListAlertEvents(c *gin.Context) {
Severity
:
strings
.
TrimSpace
(
c
.
Query
(
"severity"
)),
Severity
:
strings
.
TrimSpace
(
c
.
Query
(
"severity"
)),
}
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"email_sent"
));
v
!=
""
{
vv
:=
strings
.
ToLower
(
v
)
switch
vv
{
case
"true"
,
"1"
:
b
:=
true
filter
.
EmailSent
=
&
b
case
"false"
,
"0"
:
b
:=
false
filter
.
EmailSent
=
&
b
default
:
response
.
BadRequest
(
c
,
"Invalid email_sent"
)
return
}
}
// Cursor pagination: both params must be provided together.
rawTS
:=
strings
.
TrimSpace
(
c
.
Query
(
"before_fired_at"
))
rawID
:=
strings
.
TrimSpace
(
c
.
Query
(
"before_id"
))
if
(
rawTS
==
""
)
!=
(
rawID
==
""
)
{
response
.
BadRequest
(
c
,
"before_fired_at and before_id must be provided together"
)
return
}
if
rawTS
!=
""
{
ts
,
err
:=
time
.
Parse
(
time
.
RFC3339Nano
,
rawTS
)
if
err
!=
nil
{
if
t2
,
err2
:=
time
.
Parse
(
time
.
RFC3339
,
rawTS
);
err2
==
nil
{
ts
=
t2
}
else
{
response
.
BadRequest
(
c
,
"Invalid before_fired_at"
)
return
}
}
filter
.
BeforeFiredAt
=
&
ts
}
if
rawID
!=
""
{
id
,
err
:=
strconv
.
ParseInt
(
rawID
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid before_id"
)
return
}
filter
.
BeforeID
=
&
id
}
// Optional global filter support (platform/group/time range).
// Optional global filter support (platform/group/time range).
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
filter
.
Platform
=
platform
filter
.
Platform
=
platform
...
...
backend/internal/handler/admin/ops_handler.go
View file @
27214f86
...
@@ -19,6 +19,57 @@ type OpsHandler struct {
...
@@ -19,6 +19,57 @@ type OpsHandler struct {
opsService
*
service
.
OpsService
opsService
*
service
.
OpsService
}
}
// GetErrorLogByID returns ops error log detail.
// GET /api/v1/admin/ops/errors/:id
func
(
h
*
OpsHandler
)
GetErrorLogByID
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
detail
,
err
:=
h
.
opsService
.
GetErrorLogByID
(
c
.
Request
.
Context
(),
id
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
detail
)
}
const
(
opsListViewErrors
=
"errors"
opsListViewExcluded
=
"excluded"
opsListViewAll
=
"all"
)
func
parseOpsViewParam
(
c
*
gin
.
Context
)
string
{
if
c
==
nil
{
return
""
}
v
:=
strings
.
ToLower
(
strings
.
TrimSpace
(
c
.
Query
(
"view"
)))
switch
v
{
case
""
,
opsListViewErrors
:
return
opsListViewErrors
case
opsListViewExcluded
:
return
opsListViewExcluded
case
opsListViewAll
:
return
opsListViewAll
default
:
return
opsListViewErrors
}
}
func
NewOpsHandler
(
opsService
*
service
.
OpsService
)
*
OpsHandler
{
func
NewOpsHandler
(
opsService
*
service
.
OpsService
)
*
OpsHandler
{
return
&
OpsHandler
{
opsService
:
opsService
}
return
&
OpsHandler
{
opsService
:
opsService
}
}
}
...
@@ -47,16 +98,26 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
...
@@ -47,16 +98,26 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
return
return
}
}
filter
:=
&
service
.
OpsErrorLogFilter
{
filter
:=
&
service
.
OpsErrorLogFilter
{
Page
:
page
,
PageSize
:
pageSize
}
Page
:
page
,
PageSize
:
pageSize
,
}
if
!
startTime
.
IsZero
()
{
if
!
startTime
.
IsZero
()
{
filter
.
StartTime
=
&
startTime
filter
.
StartTime
=
&
startTime
}
}
if
!
endTime
.
IsZero
()
{
if
!
endTime
.
IsZero
()
{
filter
.
EndTime
=
&
endTime
filter
.
EndTime
=
&
endTime
}
}
filter
.
View
=
parseOpsViewParam
(
c
)
filter
.
Phase
=
strings
.
TrimSpace
(
c
.
Query
(
"phase"
))
filter
.
Owner
=
strings
.
TrimSpace
(
c
.
Query
(
"error_owner"
))
filter
.
Source
=
strings
.
TrimSpace
(
c
.
Query
(
"error_source"
))
filter
.
Query
=
strings
.
TrimSpace
(
c
.
Query
(
"q"
))
filter
.
UserQuery
=
strings
.
TrimSpace
(
c
.
Query
(
"user_query"
))
// Force request errors: client-visible status >= 400.
// buildOpsErrorLogsWhere already applies this for non-upstream phase.
if
strings
.
EqualFold
(
strings
.
TrimSpace
(
filter
.
Phase
),
"upstream"
)
{
filter
.
Phase
=
""
}
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
filter
.
Platform
=
platform
filter
.
Platform
=
platform
...
@@ -77,11 +138,19 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
...
@@ -77,11 +138,19 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
}
}
filter
.
AccountID
=
&
id
filter
.
AccountID
=
&
id
}
}
if
phase
:=
strings
.
TrimSpace
(
c
.
Query
(
"phase"
));
phase
!=
""
{
filter
.
Phase
=
phase
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"resolved"
));
v
!=
""
{
}
switch
strings
.
ToLower
(
v
)
{
if
q
:=
strings
.
TrimSpace
(
c
.
Query
(
"q"
));
q
!=
""
{
case
"1"
,
"true"
,
"yes"
:
filter
.
Query
=
q
b
:=
true
filter
.
Resolved
=
&
b
case
"0"
,
"false"
,
"no"
:
b
:=
false
filter
.
Resolved
=
&
b
default
:
response
.
BadRequest
(
c
,
"Invalid resolved"
)
return
}
}
}
if
statusCodesStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"status_codes"
));
statusCodesStr
!=
""
{
if
statusCodesStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"status_codes"
));
statusCodesStr
!=
""
{
parts
:=
strings
.
Split
(
statusCodesStr
,
","
)
parts
:=
strings
.
Split
(
statusCodesStr
,
","
)
...
@@ -106,13 +175,120 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
...
@@ -106,13 +175,120 @@ func (h *OpsHandler) GetErrorLogs(c *gin.Context) {
response
.
ErrorFrom
(
c
,
err
)
response
.
ErrorFrom
(
c
,
err
)
return
return
}
}
response
.
Paginated
(
c
,
result
.
Errors
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
}
// ListRequestErrors lists client-visible request errors.
// GET /api/v1/admin/ops/request-errors
func
(
h
*
OpsHandler
)
ListRequestErrors
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
page
,
pageSize
:=
response
.
ParsePagination
(
c
)
if
pageSize
>
500
{
pageSize
=
500
}
startTime
,
endTime
,
err
:=
parseOpsTimeRange
(
c
,
"1h"
)
if
err
!=
nil
{
response
.
BadRequest
(
c
,
err
.
Error
())
return
}
filter
:=
&
service
.
OpsErrorLogFilter
{
Page
:
page
,
PageSize
:
pageSize
}
if
!
startTime
.
IsZero
()
{
filter
.
StartTime
=
&
startTime
}
if
!
endTime
.
IsZero
()
{
filter
.
EndTime
=
&
endTime
}
filter
.
View
=
parseOpsViewParam
(
c
)
filter
.
Phase
=
strings
.
TrimSpace
(
c
.
Query
(
"phase"
))
filter
.
Owner
=
strings
.
TrimSpace
(
c
.
Query
(
"error_owner"
))
filter
.
Source
=
strings
.
TrimSpace
(
c
.
Query
(
"error_source"
))
filter
.
Query
=
strings
.
TrimSpace
(
c
.
Query
(
"q"
))
filter
.
UserQuery
=
strings
.
TrimSpace
(
c
.
Query
(
"user_query"
))
// Force request errors: client-visible status >= 400.
// buildOpsErrorLogsWhere already applies this for non-upstream phase.
if
strings
.
EqualFold
(
strings
.
TrimSpace
(
filter
.
Phase
),
"upstream"
)
{
filter
.
Phase
=
""
}
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
filter
.
Platform
=
platform
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"group_id"
));
v
!=
""
{
id
,
err
:=
strconv
.
ParseInt
(
v
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid group_id"
)
return
}
filter
.
GroupID
=
&
id
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"account_id"
));
v
!=
""
{
id
,
err
:=
strconv
.
ParseInt
(
v
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid account_id"
)
return
}
filter
.
AccountID
=
&
id
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"resolved"
));
v
!=
""
{
switch
strings
.
ToLower
(
v
)
{
case
"1"
,
"true"
,
"yes"
:
b
:=
true
filter
.
Resolved
=
&
b
case
"0"
,
"false"
,
"no"
:
b
:=
false
filter
.
Resolved
=
&
b
default
:
response
.
BadRequest
(
c
,
"Invalid resolved"
)
return
}
}
if
statusCodesStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"status_codes"
));
statusCodesStr
!=
""
{
parts
:=
strings
.
Split
(
statusCodesStr
,
","
)
out
:=
make
([]
int
,
0
,
len
(
parts
))
for
_
,
part
:=
range
parts
{
p
:=
strings
.
TrimSpace
(
part
)
if
p
==
""
{
continue
}
n
,
err
:=
strconv
.
Atoi
(
p
)
if
err
!=
nil
||
n
<
0
{
response
.
BadRequest
(
c
,
"Invalid status_codes"
)
return
}
out
=
append
(
out
,
n
)
}
filter
.
StatusCodes
=
out
}
result
,
err
:=
h
.
opsService
.
GetErrorLogs
(
c
.
Request
.
Context
(),
filter
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Paginated
(
c
,
result
.
Errors
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
response
.
Paginated
(
c
,
result
.
Errors
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
}
}
// GetErrorLogByID returns a single error log detail.
// GetRequestError returns request error detail.
// GET /api/v1/admin/ops/errors/:id
// GET /api/v1/admin/ops/request-errors/:id
func
(
h
*
OpsHandler
)
GetErrorLogByID
(
c
*
gin
.
Context
)
{
func
(
h
*
OpsHandler
)
GetRequestError
(
c
*
gin
.
Context
)
{
// same storage; just proxy to existing detail
h
.
GetErrorLogByID
(
c
)
}
// ListRequestErrorUpstreamErrors lists upstream error logs correlated to a request error.
// GET /api/v1/admin/ops/request-errors/:id/upstream-errors
func
(
h
*
OpsHandler
)
ListRequestErrorUpstreamErrors
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
return
...
@@ -129,15 +305,306 @@ func (h *OpsHandler) GetErrorLogByID(c *gin.Context) {
...
@@ -129,15 +305,306 @@ func (h *OpsHandler) GetErrorLogByID(c *gin.Context) {
return
return
}
}
// Load request error to get correlation keys.
detail
,
err
:=
h
.
opsService
.
GetErrorLogByID
(
c
.
Request
.
Context
(),
id
)
detail
,
err
:=
h
.
opsService
.
GetErrorLogByID
(
c
.
Request
.
Context
(),
id
)
if
err
!=
nil
{
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
response
.
ErrorFrom
(
c
,
err
)
return
return
}
}
response
.
Success
(
c
,
detail
)
// Correlate by request_id/client_request_id.
requestID
:=
strings
.
TrimSpace
(
detail
.
RequestID
)
clientRequestID
:=
strings
.
TrimSpace
(
detail
.
ClientRequestID
)
if
requestID
==
""
&&
clientRequestID
==
""
{
response
.
Paginated
(
c
,
[]
*
service
.
OpsErrorLog
{},
0
,
1
,
10
)
return
}
page
,
pageSize
:=
response
.
ParsePagination
(
c
)
if
pageSize
>
500
{
pageSize
=
500
}
// Keep correlation window wide enough so linked upstream errors
// are discoverable even when UI defaults to 1h elsewhere.
startTime
,
endTime
,
err
:=
parseOpsTimeRange
(
c
,
"30d"
)
if
err
!=
nil
{
response
.
BadRequest
(
c
,
err
.
Error
())
return
}
filter
:=
&
service
.
OpsErrorLogFilter
{
Page
:
page
,
PageSize
:
pageSize
}
if
!
startTime
.
IsZero
()
{
filter
.
StartTime
=
&
startTime
}
if
!
endTime
.
IsZero
()
{
filter
.
EndTime
=
&
endTime
}
filter
.
View
=
"all"
filter
.
Phase
=
"upstream"
filter
.
Owner
=
"provider"
filter
.
Source
=
strings
.
TrimSpace
(
c
.
Query
(
"error_source"
))
filter
.
Query
=
strings
.
TrimSpace
(
c
.
Query
(
"q"
))
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
filter
.
Platform
=
platform
}
// Prefer exact match on request_id; if missing, fall back to client_request_id.
if
requestID
!=
""
{
filter
.
RequestID
=
requestID
}
else
{
filter
.
ClientRequestID
=
clientRequestID
}
result
,
err
:=
h
.
opsService
.
GetErrorLogs
(
c
.
Request
.
Context
(),
filter
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
// If client asks for details, expand each upstream error log to include upstream response fields.
includeDetail
:=
strings
.
TrimSpace
(
c
.
Query
(
"include_detail"
))
if
includeDetail
==
"1"
||
strings
.
EqualFold
(
includeDetail
,
"true"
)
||
strings
.
EqualFold
(
includeDetail
,
"yes"
)
{
details
:=
make
([]
*
service
.
OpsErrorLogDetail
,
0
,
len
(
result
.
Errors
))
for
_
,
item
:=
range
result
.
Errors
{
if
item
==
nil
{
continue
}
d
,
err
:=
h
.
opsService
.
GetErrorLogByID
(
c
.
Request
.
Context
(),
item
.
ID
)
if
err
!=
nil
||
d
==
nil
{
continue
}
details
=
append
(
details
,
d
)
}
response
.
Paginated
(
c
,
details
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
return
}
response
.
Paginated
(
c
,
result
.
Errors
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
}
// RetryRequestErrorClient retries the client request based on stored request body.
// POST /api/v1/admin/ops/request-errors/:id/retry-client
func
(
h
*
OpsHandler
)
RetryRequestErrorClient
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
subject
,
ok
:=
middleware
.
GetAuthSubjectFromContext
(
c
)
if
!
ok
||
subject
.
UserID
<=
0
{
response
.
Error
(
c
,
http
.
StatusUnauthorized
,
"Unauthorized"
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
result
,
err
:=
h
.
opsService
.
RetryError
(
c
.
Request
.
Context
(),
subject
.
UserID
,
id
,
service
.
OpsRetryModeClient
,
nil
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
result
)
}
// RetryRequestErrorUpstreamEvent retries a specific upstream attempt using captured upstream_request_body.
// POST /api/v1/admin/ops/request-errors/:id/upstream-errors/:idx/retry
func
(
h
*
OpsHandler
)
RetryRequestErrorUpstreamEvent
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
subject
,
ok
:=
middleware
.
GetAuthSubjectFromContext
(
c
)
if
!
ok
||
subject
.
UserID
<=
0
{
response
.
Error
(
c
,
http
.
StatusUnauthorized
,
"Unauthorized"
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
idxStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"idx"
))
idx
,
err
:=
strconv
.
Atoi
(
idxStr
)
if
err
!=
nil
||
idx
<
0
{
response
.
BadRequest
(
c
,
"Invalid upstream idx"
)
return
}
result
,
err
:=
h
.
opsService
.
RetryUpstreamEvent
(
c
.
Request
.
Context
(),
subject
.
UserID
,
id
,
idx
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
result
)
}
}
// ResolveRequestError toggles resolved status.
// PUT /api/v1/admin/ops/request-errors/:id/resolve
func
(
h
*
OpsHandler
)
ResolveRequestError
(
c
*
gin
.
Context
)
{
h
.
UpdateErrorResolution
(
c
)
}
// ListUpstreamErrors lists independent upstream errors.
// GET /api/v1/admin/ops/upstream-errors
func
(
h
*
OpsHandler
)
ListUpstreamErrors
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
page
,
pageSize
:=
response
.
ParsePagination
(
c
)
if
pageSize
>
500
{
pageSize
=
500
}
startTime
,
endTime
,
err
:=
parseOpsTimeRange
(
c
,
"1h"
)
if
err
!=
nil
{
response
.
BadRequest
(
c
,
err
.
Error
())
return
}
filter
:=
&
service
.
OpsErrorLogFilter
{
Page
:
page
,
PageSize
:
pageSize
}
if
!
startTime
.
IsZero
()
{
filter
.
StartTime
=
&
startTime
}
if
!
endTime
.
IsZero
()
{
filter
.
EndTime
=
&
endTime
}
filter
.
View
=
parseOpsViewParam
(
c
)
filter
.
Phase
=
"upstream"
filter
.
Owner
=
"provider"
filter
.
Source
=
strings
.
TrimSpace
(
c
.
Query
(
"error_source"
))
filter
.
Query
=
strings
.
TrimSpace
(
c
.
Query
(
"q"
))
if
platform
:=
strings
.
TrimSpace
(
c
.
Query
(
"platform"
));
platform
!=
""
{
filter
.
Platform
=
platform
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"group_id"
));
v
!=
""
{
id
,
err
:=
strconv
.
ParseInt
(
v
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid group_id"
)
return
}
filter
.
GroupID
=
&
id
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"account_id"
));
v
!=
""
{
id
,
err
:=
strconv
.
ParseInt
(
v
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid account_id"
)
return
}
filter
.
AccountID
=
&
id
}
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"resolved"
));
v
!=
""
{
switch
strings
.
ToLower
(
v
)
{
case
"1"
,
"true"
,
"yes"
:
b
:=
true
filter
.
Resolved
=
&
b
case
"0"
,
"false"
,
"no"
:
b
:=
false
filter
.
Resolved
=
&
b
default
:
response
.
BadRequest
(
c
,
"Invalid resolved"
)
return
}
}
if
statusCodesStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"status_codes"
));
statusCodesStr
!=
""
{
parts
:=
strings
.
Split
(
statusCodesStr
,
","
)
out
:=
make
([]
int
,
0
,
len
(
parts
))
for
_
,
part
:=
range
parts
{
p
:=
strings
.
TrimSpace
(
part
)
if
p
==
""
{
continue
}
n
,
err
:=
strconv
.
Atoi
(
p
)
if
err
!=
nil
||
n
<
0
{
response
.
BadRequest
(
c
,
"Invalid status_codes"
)
return
}
out
=
append
(
out
,
n
)
}
filter
.
StatusCodes
=
out
}
result
,
err
:=
h
.
opsService
.
GetErrorLogs
(
c
.
Request
.
Context
(),
filter
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Paginated
(
c
,
result
.
Errors
,
int64
(
result
.
Total
),
result
.
Page
,
result
.
PageSize
)
}
// GetUpstreamError returns upstream error detail.
// GET /api/v1/admin/ops/upstream-errors/:id
func
(
h
*
OpsHandler
)
GetUpstreamError
(
c
*
gin
.
Context
)
{
h
.
GetErrorLogByID
(
c
)
}
// RetryUpstreamError retries upstream error using the original account_id.
// POST /api/v1/admin/ops/upstream-errors/:id/retry
func
(
h
*
OpsHandler
)
RetryUpstreamError
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
subject
,
ok
:=
middleware
.
GetAuthSubjectFromContext
(
c
)
if
!
ok
||
subject
.
UserID
<=
0
{
response
.
Error
(
c
,
http
.
StatusUnauthorized
,
"Unauthorized"
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
result
,
err
:=
h
.
opsService
.
RetryError
(
c
.
Request
.
Context
(),
subject
.
UserID
,
id
,
service
.
OpsRetryModeUpstream
,
nil
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
result
)
}
// ResolveUpstreamError toggles resolved status.
// PUT /api/v1/admin/ops/upstream-errors/:id/resolve
func
(
h
*
OpsHandler
)
ResolveUpstreamError
(
c
*
gin
.
Context
)
{
h
.
UpdateErrorResolution
(
c
)
}
// ==================== Existing endpoints ====================
// ListRequestDetails returns a request-level list (success + error) for drill-down.
// ListRequestDetails returns a request-level list (success + error) for drill-down.
// GET /api/v1/admin/ops/requests
// GET /api/v1/admin/ops/requests
func
(
h
*
OpsHandler
)
ListRequestDetails
(
c
*
gin
.
Context
)
{
func
(
h
*
OpsHandler
)
ListRequestDetails
(
c
*
gin
.
Context
)
{
...
@@ -242,6 +709,11 @@ func (h *OpsHandler) ListRequestDetails(c *gin.Context) {
...
@@ -242,6 +709,11 @@ func (h *OpsHandler) ListRequestDetails(c *gin.Context) {
type
opsRetryRequest
struct
{
type
opsRetryRequest
struct
{
Mode
string
`json:"mode"`
Mode
string
`json:"mode"`
PinnedAccountID
*
int64
`json:"pinned_account_id"`
PinnedAccountID
*
int64
`json:"pinned_account_id"`
Force
bool
`json:"force"`
}
type
opsResolveRequest
struct
{
Resolved
bool
`json:"resolved"`
}
}
// RetryErrorRequest retries a failed request using stored request_body.
// RetryErrorRequest retries a failed request using stored request_body.
...
@@ -278,6 +750,16 @@ func (h *OpsHandler) RetryErrorRequest(c *gin.Context) {
...
@@ -278,6 +750,16 @@ func (h *OpsHandler) RetryErrorRequest(c *gin.Context) {
req
.
Mode
=
service
.
OpsRetryModeClient
req
.
Mode
=
service
.
OpsRetryModeClient
}
}
// Force flag is currently a UI-level acknowledgement. Server may still enforce safety constraints.
_
=
req
.
Force
// Legacy endpoint safety: only allow retrying the client request here.
// Upstream retries must go through the split endpoints.
if
strings
.
EqualFold
(
strings
.
TrimSpace
(
req
.
Mode
),
service
.
OpsRetryModeUpstream
)
{
response
.
BadRequest
(
c
,
"upstream retry is not supported on this endpoint"
)
return
}
result
,
err
:=
h
.
opsService
.
RetryError
(
c
.
Request
.
Context
(),
subject
.
UserID
,
id
,
req
.
Mode
,
req
.
PinnedAccountID
)
result
,
err
:=
h
.
opsService
.
RetryError
(
c
.
Request
.
Context
(),
subject
.
UserID
,
id
,
req
.
Mode
,
req
.
PinnedAccountID
)
if
err
!=
nil
{
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
response
.
ErrorFrom
(
c
,
err
)
...
@@ -287,6 +769,81 @@ func (h *OpsHandler) RetryErrorRequest(c *gin.Context) {
...
@@ -287,6 +769,81 @@ func (h *OpsHandler) RetryErrorRequest(c *gin.Context) {
response
.
Success
(
c
,
result
)
response
.
Success
(
c
,
result
)
}
}
// ListRetryAttempts lists retry attempts for an error log.
// GET /api/v1/admin/ops/errors/:id/retries
func
(
h
*
OpsHandler
)
ListRetryAttempts
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
limit
:=
50
if
v
:=
strings
.
TrimSpace
(
c
.
Query
(
"limit"
));
v
!=
""
{
n
,
err
:=
strconv
.
Atoi
(
v
)
if
err
!=
nil
||
n
<=
0
{
response
.
BadRequest
(
c
,
"Invalid limit"
)
return
}
limit
=
n
}
items
,
err
:=
h
.
opsService
.
ListRetryAttemptsByErrorID
(
c
.
Request
.
Context
(),
id
,
limit
)
if
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
items
)
}
// UpdateErrorResolution allows manual resolve/unresolve.
// PUT /api/v1/admin/ops/errors/:id/resolve
func
(
h
*
OpsHandler
)
UpdateErrorResolution
(
c
*
gin
.
Context
)
{
if
h
.
opsService
==
nil
{
response
.
Error
(
c
,
http
.
StatusServiceUnavailable
,
"Ops service not available"
)
return
}
if
err
:=
h
.
opsService
.
RequireMonitoringEnabled
(
c
.
Request
.
Context
());
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
subject
,
ok
:=
middleware
.
GetAuthSubjectFromContext
(
c
)
if
!
ok
||
subject
.
UserID
<=
0
{
response
.
Error
(
c
,
http
.
StatusUnauthorized
,
"Unauthorized"
)
return
}
idStr
:=
strings
.
TrimSpace
(
c
.
Param
(
"id"
))
id
,
err
:=
strconv
.
ParseInt
(
idStr
,
10
,
64
)
if
err
!=
nil
||
id
<=
0
{
response
.
BadRequest
(
c
,
"Invalid error id"
)
return
}
var
req
opsResolveRequest
if
err
:=
c
.
ShouldBindJSON
(
&
req
);
err
!=
nil
{
response
.
BadRequest
(
c
,
"Invalid request: "
+
err
.
Error
())
return
}
uid
:=
subject
.
UserID
if
err
:=
h
.
opsService
.
UpdateErrorResolution
(
c
.
Request
.
Context
(),
id
,
req
.
Resolved
,
&
uid
,
nil
);
err
!=
nil
{
response
.
ErrorFrom
(
c
,
err
)
return
}
response
.
Success
(
c
,
gin
.
H
{
"ok"
:
true
})
}
func
parseOpsTimeRange
(
c
*
gin
.
Context
,
defaultRange
string
)
(
time
.
Time
,
time
.
Time
,
error
)
{
func
parseOpsTimeRange
(
c
*
gin
.
Context
,
defaultRange
string
)
(
time
.
Time
,
time
.
Time
,
error
)
{
startStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"start_time"
))
startStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"start_time"
))
endStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"end_time"
))
endStr
:=
strings
.
TrimSpace
(
c
.
Query
(
"end_time"
))
...
@@ -358,6 +915,10 @@ func parseOpsDuration(v string) (time.Duration, bool) {
...
@@ -358,6 +915,10 @@ func parseOpsDuration(v string) (time.Duration, bool) {
return
6
*
time
.
Hour
,
true
return
6
*
time
.
Hour
,
true
case
"24h"
:
case
"24h"
:
return
24
*
time
.
Hour
,
true
return
24
*
time
.
Hour
,
true
case
"7d"
:
return
7
*
24
*
time
.
Hour
,
true
case
"30d"
:
return
30
*
24
*
time
.
Hour
,
true
default
:
default
:
return
0
,
false
return
0
,
false
}
}
...
...
backend/internal/handler/ops_error_logger.go
View file @
27214f86
...
@@ -544,6 +544,11 @@ func OpsErrorLoggerMiddleware(ops *service.OpsService) gin.HandlerFunc {
...
@@ -544,6 +544,11 @@ func OpsErrorLoggerMiddleware(ops *service.OpsService) gin.HandlerFunc {
body
:=
w
.
buf
.
Bytes
()
body
:=
w
.
buf
.
Bytes
()
parsed
:=
parseOpsErrorResponse
(
body
)
parsed
:=
parseOpsErrorResponse
(
body
)
// Skip logging if the error should be filtered based on settings
if
shouldSkipOpsErrorLog
(
c
.
Request
.
Context
(),
ops
,
parsed
.
Message
,
string
(
body
),
c
.
Request
.
URL
.
Path
)
{
return
}
apiKey
,
_
:=
middleware2
.
GetAPIKeyFromContext
(
c
)
apiKey
,
_
:=
middleware2
.
GetAPIKeyFromContext
(
c
)
clientRequestID
,
_
:=
c
.
Request
.
Context
()
.
Value
(
ctxkey
.
ClientRequestID
)
.
(
string
)
clientRequestID
,
_
:=
c
.
Request
.
Context
()
.
Value
(
ctxkey
.
ClientRequestID
)
.
(
string
)
...
@@ -832,28 +837,30 @@ func normalizeOpsErrorType(errType string, code string) string {
...
@@ -832,28 +837,30 @@ func normalizeOpsErrorType(errType string, code string) string {
func
classifyOpsPhase
(
errType
,
message
,
code
string
)
string
{
func
classifyOpsPhase
(
errType
,
message
,
code
string
)
string
{
msg
:=
strings
.
ToLower
(
message
)
msg
:=
strings
.
ToLower
(
message
)
// Standardized phases: request|auth|routing|upstream|network|internal
// Map billing/concurrency/response => request; scheduling => routing.
switch
strings
.
TrimSpace
(
code
)
{
switch
strings
.
TrimSpace
(
code
)
{
case
"INSUFFICIENT_BALANCE"
,
"USAGE_LIMIT_EXCEEDED"
,
"SUBSCRIPTION_NOT_FOUND"
,
"SUBSCRIPTION_INVALID"
:
case
"INSUFFICIENT_BALANCE"
,
"USAGE_LIMIT_EXCEEDED"
,
"SUBSCRIPTION_NOT_FOUND"
,
"SUBSCRIPTION_INVALID"
:
return
"
billing
"
return
"
request
"
}
}
switch
errType
{
switch
errType
{
case
"authentication_error"
:
case
"authentication_error"
:
return
"auth"
return
"auth"
case
"billing_error"
,
"subscription_error"
:
case
"billing_error"
,
"subscription_error"
:
return
"
billing
"
return
"
request
"
case
"rate_limit_error"
:
case
"rate_limit_error"
:
if
strings
.
Contains
(
msg
,
"concurrency"
)
||
strings
.
Contains
(
msg
,
"pending"
)
||
strings
.
Contains
(
msg
,
"queue"
)
{
if
strings
.
Contains
(
msg
,
"concurrency"
)
||
strings
.
Contains
(
msg
,
"pending"
)
||
strings
.
Contains
(
msg
,
"queue"
)
{
return
"
concurrency
"
return
"
request
"
}
}
return
"upstream"
return
"upstream"
case
"invalid_request_error"
:
case
"invalid_request_error"
:
return
"re
sponse
"
return
"re
quest
"
case
"upstream_error"
,
"overloaded_error"
:
case
"upstream_error"
,
"overloaded_error"
:
return
"upstream"
return
"upstream"
case
"api_error"
:
case
"api_error"
:
if
strings
.
Contains
(
msg
,
"no available accounts"
)
{
if
strings
.
Contains
(
msg
,
"no available accounts"
)
{
return
"
schedul
ing"
return
"
rout
ing"
}
}
return
"internal"
return
"internal"
default
:
default
:
...
@@ -914,34 +921,38 @@ func classifyOpsIsBusinessLimited(errType, phase, code string, status int, messa
...
@@ -914,34 +921,38 @@ func classifyOpsIsBusinessLimited(errType, phase, code string, status int, messa
}
}
func
classifyOpsErrorOwner
(
phase
string
,
message
string
)
string
{
func
classifyOpsErrorOwner
(
phase
string
,
message
string
)
string
{
// Standardized owners: client|provider|platform
switch
phase
{
switch
phase
{
case
"upstream"
,
"network"
:
case
"upstream"
,
"network"
:
return
"provider"
return
"provider"
case
"
billing"
,
"concurrency"
,
"auth"
,
"response
"
:
case
"
request"
,
"auth
"
:
return
"client"
return
"client"
case
"routing"
,
"internal"
:
return
"platform"
default
:
default
:
if
strings
.
Contains
(
strings
.
ToLower
(
message
),
"upstream"
)
{
if
strings
.
Contains
(
strings
.
ToLower
(
message
),
"upstream"
)
{
return
"provider"
return
"provider"
}
}
return
"
sub2api
"
return
"
platform
"
}
}
}
}
func
classifyOpsErrorSource
(
phase
string
,
message
string
)
string
{
func
classifyOpsErrorSource
(
phase
string
,
message
string
)
string
{
// Standardized sources: client_request|upstream_http|gateway
switch
phase
{
switch
phase
{
case
"upstream"
:
case
"upstream"
:
return
"upstream_http"
return
"upstream_http"
case
"network"
:
case
"network"
:
return
"
upstream_network
"
return
"
gateway
"
case
"
billing
"
:
case
"
request"
,
"auth
"
:
return
"
billing
"
return
"
client_request
"
case
"
concurrency
"
:
case
"
routing"
,
"internal
"
:
return
"
concurrenc
y"
return
"
gatewa
y"
default
:
default
:
if
strings
.
Contains
(
strings
.
ToLower
(
message
),
"upstream"
)
{
if
strings
.
Contains
(
strings
.
ToLower
(
message
),
"upstream"
)
{
return
"upstream_http"
return
"upstream_http"
}
}
return
"
internal
"
return
"
gateway
"
}
}
}
}
...
@@ -963,3 +974,42 @@ func truncateString(s string, max int) string {
...
@@ -963,3 +974,42 @@ func truncateString(s string, max int) string {
func
strconvItoa
(
v
int
)
string
{
func
strconvItoa
(
v
int
)
string
{
return
strconv
.
Itoa
(
v
)
return
strconv
.
Itoa
(
v
)
}
}
// shouldSkipOpsErrorLog determines if an error should be skipped from logging based on settings.
// Returns true for errors that should be filtered according to OpsAdvancedSettings.
func
shouldSkipOpsErrorLog
(
ctx
context
.
Context
,
ops
*
service
.
OpsService
,
message
,
body
,
requestPath
string
)
bool
{
if
ops
==
nil
{
return
false
}
// Get advanced settings to check filter configuration
settings
,
err
:=
ops
.
GetOpsAdvancedSettings
(
ctx
)
if
err
!=
nil
||
settings
==
nil
{
// If we can't get settings, don't skip (fail open)
return
false
}
msgLower
:=
strings
.
ToLower
(
message
)
bodyLower
:=
strings
.
ToLower
(
body
)
// Check if count_tokens errors should be ignored
if
settings
.
IgnoreCountTokensErrors
&&
strings
.
Contains
(
requestPath
,
"/count_tokens"
)
{
return
true
}
// Check if context canceled errors should be ignored (client disconnects)
if
settings
.
IgnoreContextCanceled
{
if
strings
.
Contains
(
msgLower
,
"context canceled"
)
||
strings
.
Contains
(
bodyLower
,
"context canceled"
)
{
return
true
}
}
// Check if "no available accounts" errors should be ignored
if
settings
.
IgnoreNoAvailableAccounts
{
if
strings
.
Contains
(
msgLower
,
"no available accounts"
)
||
strings
.
Contains
(
bodyLower
,
"no available accounts"
)
{
return
true
}
}
return
false
}
backend/internal/repository/ops_repo.go
View file @
27214f86
...
@@ -55,7 +55,6 @@ INSERT INTO ops_error_logs (
...
@@ -55,7 +55,6 @@ INSERT INTO ops_error_logs (
upstream_error_message,
upstream_error_message,
upstream_error_detail,
upstream_error_detail,
upstream_errors,
upstream_errors,
duration_ms,
time_to_first_token_ms,
time_to_first_token_ms,
request_body,
request_body,
request_body_truncated,
request_body_truncated,
...
@@ -65,7 +64,7 @@ INSERT INTO ops_error_logs (
...
@@ -65,7 +64,7 @@ INSERT INTO ops_error_logs (
retry_count,
retry_count,
created_at
created_at
) VALUES (
) VALUES (
$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34
,$35
$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34
) RETURNING id`
) RETURNING id`
var
id
int64
var
id
int64
...
@@ -98,7 +97,6 @@ INSERT INTO ops_error_logs (
...
@@ -98,7 +97,6 @@ INSERT INTO ops_error_logs (
opsNullString
(
input
.
UpstreamErrorMessage
),
opsNullString
(
input
.
UpstreamErrorMessage
),
opsNullString
(
input
.
UpstreamErrorDetail
),
opsNullString
(
input
.
UpstreamErrorDetail
),
opsNullString
(
input
.
UpstreamErrorsJSON
),
opsNullString
(
input
.
UpstreamErrorsJSON
),
opsNullInt
(
input
.
DurationMs
),
opsNullInt64
(
input
.
TimeToFirstTokenMs
),
opsNullInt64
(
input
.
TimeToFirstTokenMs
),
opsNullString
(
input
.
RequestBodyJSON
),
opsNullString
(
input
.
RequestBodyJSON
),
input
.
RequestBodyTruncated
,
input
.
RequestBodyTruncated
,
...
@@ -135,7 +133,7 @@ func (r *opsRepository) ListErrorLogs(ctx context.Context, filter *service.OpsEr
...
@@ -135,7 +133,7 @@ func (r *opsRepository) ListErrorLogs(ctx context.Context, filter *service.OpsEr
}
}
where
,
args
:=
buildOpsErrorLogsWhere
(
filter
)
where
,
args
:=
buildOpsErrorLogsWhere
(
filter
)
countSQL
:=
"SELECT COUNT(*) FROM ops_error_logs "
+
where
countSQL
:=
"SELECT COUNT(*) FROM ops_error_logs
e
"
+
where
var
total
int
var
total
int
if
err
:=
r
.
db
.
QueryRowContext
(
ctx
,
countSQL
,
args
...
)
.
Scan
(
&
total
);
err
!=
nil
{
if
err
:=
r
.
db
.
QueryRowContext
(
ctx
,
countSQL
,
args
...
)
.
Scan
(
&
total
);
err
!=
nil
{
...
@@ -146,28 +144,43 @@ func (r *opsRepository) ListErrorLogs(ctx context.Context, filter *service.OpsEr
...
@@ -146,28 +144,43 @@ func (r *opsRepository) ListErrorLogs(ctx context.Context, filter *service.OpsEr
argsWithLimit
:=
append
(
args
,
pageSize
,
offset
)
argsWithLimit
:=
append
(
args
,
pageSize
,
offset
)
selectSQL
:=
`
selectSQL
:=
`
SELECT
SELECT
id,
e.id,
created_at,
e.created_at,
error_phase,
e.error_phase,
error_type,
e.error_type,
severity,
COALESCE(e.error_owner, ''),
COALESCE(upstream_status_code, status_code, 0),
COALESCE(e.error_source, ''),
COALESCE(platform, ''),
e.severity,
COALESCE(model, ''),
COALESCE(e.upstream_status_code, e.status_code, 0),
duration_ms,
COALESCE(e.platform, ''),
COALESCE(client_request_id, ''),
COALESCE(e.model, ''),
COALESCE(request_id, ''),
COALESCE(e.is_retryable, false),
COALESCE(error_message, ''),
COALESCE(e.retry_count, 0),
user_id,
COALESCE(e.resolved, false),
api_key_id,
e.resolved_at,
account_id,
e.resolved_by_user_id,
group_id,
COALESCE(u2.email, ''),
CASE WHEN client_ip IS NULL THEN NULL ELSE client_ip::text END,
e.resolved_retry_id,
COALESCE(request_path, ''),
COALESCE(e.client_request_id, ''),
stream
COALESCE(e.request_id, ''),
FROM ops_error_logs
COALESCE(e.error_message, ''),
e.user_id,
COALESCE(u.email, ''),
e.api_key_id,
e.account_id,
COALESCE(a.name, ''),
e.group_id,
COALESCE(g.name, ''),
CASE WHEN e.client_ip IS NULL THEN NULL ELSE e.client_ip::text END,
COALESCE(e.request_path, ''),
e.stream
FROM ops_error_logs e
LEFT JOIN accounts a ON e.account_id = a.id
LEFT JOIN groups g ON e.group_id = g.id
LEFT JOIN users u ON e.user_id = u.id
LEFT JOIN users u2 ON e.resolved_by_user_id = u2.id
`
+
where
+
`
`
+
where
+
`
ORDER BY created_at DESC
ORDER BY
e.
created_at DESC
LIMIT $`
+
itoa
(
len
(
args
)
+
1
)
+
` OFFSET $`
+
itoa
(
len
(
args
)
+
2
)
LIMIT $`
+
itoa
(
len
(
args
)
+
1
)
+
` OFFSET $`
+
itoa
(
len
(
args
)
+
2
)
rows
,
err
:=
r
.
db
.
QueryContext
(
ctx
,
selectSQL
,
argsWithLimit
...
)
rows
,
err
:=
r
.
db
.
QueryContext
(
ctx
,
selectSQL
,
argsWithLimit
...
)
...
@@ -179,39 +192,65 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
...
@@ -179,39 +192,65 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
out
:=
make
([]
*
service
.
OpsErrorLog
,
0
,
pageSize
)
out
:=
make
([]
*
service
.
OpsErrorLog
,
0
,
pageSize
)
for
rows
.
Next
()
{
for
rows
.
Next
()
{
var
item
service
.
OpsErrorLog
var
item
service
.
OpsErrorLog
var
latency
sql
.
NullInt64
var
statusCode
sql
.
NullInt64
var
statusCode
sql
.
NullInt64
var
clientIP
sql
.
NullString
var
clientIP
sql
.
NullString
var
userID
sql
.
NullInt64
var
userID
sql
.
NullInt64
var
apiKeyID
sql
.
NullInt64
var
apiKeyID
sql
.
NullInt64
var
accountID
sql
.
NullInt64
var
accountID
sql
.
NullInt64
var
accountName
string
var
groupID
sql
.
NullInt64
var
groupID
sql
.
NullInt64
var
groupName
string
var
userEmail
string
var
resolvedAt
sql
.
NullTime
var
resolvedBy
sql
.
NullInt64
var
resolvedByName
string
var
resolvedRetryID
sql
.
NullInt64
if
err
:=
rows
.
Scan
(
if
err
:=
rows
.
Scan
(
&
item
.
ID
,
&
item
.
ID
,
&
item
.
CreatedAt
,
&
item
.
CreatedAt
,
&
item
.
Phase
,
&
item
.
Phase
,
&
item
.
Type
,
&
item
.
Type
,
&
item
.
Owner
,
&
item
.
Source
,
&
item
.
Severity
,
&
item
.
Severity
,
&
statusCode
,
&
statusCode
,
&
item
.
Platform
,
&
item
.
Platform
,
&
item
.
Model
,
&
item
.
Model
,
&
latency
,
&
item
.
IsRetryable
,
&
item
.
RetryCount
,
&
item
.
Resolved
,
&
resolvedAt
,
&
resolvedBy
,
&
resolvedByName
,
&
resolvedRetryID
,
&
item
.
ClientRequestID
,
&
item
.
ClientRequestID
,
&
item
.
RequestID
,
&
item
.
RequestID
,
&
item
.
Message
,
&
item
.
Message
,
&
userID
,
&
userID
,
&
userEmail
,
&
apiKeyID
,
&
apiKeyID
,
&
accountID
,
&
accountID
,
&
accountName
,
&
groupID
,
&
groupID
,
&
groupName
,
&
clientIP
,
&
clientIP
,
&
item
.
RequestPath
,
&
item
.
RequestPath
,
&
item
.
Stream
,
&
item
.
Stream
,
);
err
!=
nil
{
);
err
!=
nil
{
return
nil
,
err
return
nil
,
err
}
}
if
latency
.
Valid
{
if
resolvedAt
.
Valid
{
v
:=
int
(
latency
.
Int64
)
t
:=
resolvedAt
.
Time
item
.
LatencyMs
=
&
v
item
.
ResolvedAt
=
&
t
}
if
resolvedBy
.
Valid
{
v
:=
resolvedBy
.
Int64
item
.
ResolvedByUserID
=
&
v
}
item
.
ResolvedByUserName
=
resolvedByName
if
resolvedRetryID
.
Valid
{
v
:=
resolvedRetryID
.
Int64
item
.
ResolvedRetryID
=
&
v
}
}
item
.
StatusCode
=
int
(
statusCode
.
Int64
)
item
.
StatusCode
=
int
(
statusCode
.
Int64
)
if
clientIP
.
Valid
{
if
clientIP
.
Valid
{
...
@@ -222,6 +261,7 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
...
@@ -222,6 +261,7 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
v
:=
userID
.
Int64
v
:=
userID
.
Int64
item
.
UserID
=
&
v
item
.
UserID
=
&
v
}
}
item
.
UserEmail
=
userEmail
if
apiKeyID
.
Valid
{
if
apiKeyID
.
Valid
{
v
:=
apiKeyID
.
Int64
v
:=
apiKeyID
.
Int64
item
.
APIKeyID
=
&
v
item
.
APIKeyID
=
&
v
...
@@ -230,10 +270,12 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
...
@@ -230,10 +270,12 @@ LIMIT $` + itoa(len(args)+1) + ` OFFSET $` + itoa(len(args)+2)
v
:=
accountID
.
Int64
v
:=
accountID
.
Int64
item
.
AccountID
=
&
v
item
.
AccountID
=
&
v
}
}
item
.
AccountName
=
accountName
if
groupID
.
Valid
{
if
groupID
.
Valid
{
v
:=
groupID
.
Int64
v
:=
groupID
.
Int64
item
.
GroupID
=
&
v
item
.
GroupID
=
&
v
}
}
item
.
GroupName
=
groupName
out
=
append
(
out
,
&
item
)
out
=
append
(
out
,
&
item
)
}
}
if
err
:=
rows
.
Err
();
err
!=
nil
{
if
err
:=
rows
.
Err
();
err
!=
nil
{
...
@@ -258,49 +300,64 @@ func (r *opsRepository) GetErrorLogByID(ctx context.Context, id int64) (*service
...
@@ -258,49 +300,64 @@ func (r *opsRepository) GetErrorLogByID(ctx context.Context, id int64) (*service
q
:=
`
q
:=
`
SELECT
SELECT
id,
e.id,
created_at,
e.created_at,
error_phase,
e.error_phase,
error_type,
e.error_type,
severity,
COALESCE(e.error_owner, ''),
COALESCE(upstream_status_code, status_code, 0),
COALESCE(e.error_source, ''),
COALESCE(platform, ''),
e.severity,
COALESCE(model, ''),
COALESCE(e.upstream_status_code, e.status_code, 0),
duration_ms,
COALESCE(e.platform, ''),
COALESCE(client_request_id, ''),
COALESCE(e.model, ''),
COALESCE(request_id, ''),
COALESCE(e.is_retryable, false),
COALESCE(error_message, ''),
COALESCE(e.retry_count, 0),
COALESCE(error_body, ''),
COALESCE(e.resolved, false),
upstream_status_code,
e.resolved_at,
COALESCE(upstream_error_message, ''),
e.resolved_by_user_id,
COALESCE(upstream_error_detail, ''),
e.resolved_retry_id,
COALESCE(upstream_errors::text, ''),
COALESCE(e.client_request_id, ''),
is_business_limited,
COALESCE(e.request_id, ''),
user_id,
COALESCE(e.error_message, ''),
api_key_id,
COALESCE(e.error_body, ''),
account_id,
e.upstream_status_code,
group_id,
COALESCE(e.upstream_error_message, ''),
CASE WHEN client_ip IS NULL THEN NULL ELSE client_ip::text END,
COALESCE(e.upstream_error_detail, ''),
COALESCE(request_path, ''),
COALESCE(e.upstream_errors::text, ''),
stream,
e.is_business_limited,
COALESCE(user_agent, ''),
e.user_id,
auth_latency_ms,
COALESCE(u.email, ''),
routing_latency_ms,
e.api_key_id,
upstream_latency_ms,
e.account_id,
response_latency_ms,
COALESCE(a.name, ''),
time_to_first_token_ms,
e.group_id,
COALESCE(request_body::text, ''),
COALESCE(g.name, ''),
request_body_truncated,
CASE WHEN e.client_ip IS NULL THEN NULL ELSE e.client_ip::text END,
request_body_bytes,
COALESCE(e.request_path, ''),
COALESCE(request_headers::text, '')
e.stream,
FROM ops_error_logs
COALESCE(e.user_agent, ''),
WHERE id = $1
e.auth_latency_ms,
e.routing_latency_ms,
e.upstream_latency_ms,
e.response_latency_ms,
e.time_to_first_token_ms,
COALESCE(e.request_body::text, ''),
e.request_body_truncated,
e.request_body_bytes,
COALESCE(e.request_headers::text, '')
FROM ops_error_logs e
LEFT JOIN users u ON e.user_id = u.id
LEFT JOIN accounts a ON e.account_id = a.id
LEFT JOIN groups g ON e.group_id = g.id
WHERE e.id = $1
LIMIT 1`
LIMIT 1`
var
out
service
.
OpsErrorLogDetail
var
out
service
.
OpsErrorLogDetail
var
latency
sql
.
NullInt64
var
statusCode
sql
.
NullInt64
var
statusCode
sql
.
NullInt64
var
upstreamStatusCode
sql
.
NullInt64
var
upstreamStatusCode
sql
.
NullInt64
var
resolvedAt
sql
.
NullTime
var
resolvedBy
sql
.
NullInt64
var
resolvedRetryID
sql
.
NullInt64
var
clientIP
sql
.
NullString
var
clientIP
sql
.
NullString
var
userID
sql
.
NullInt64
var
userID
sql
.
NullInt64
var
apiKeyID
sql
.
NullInt64
var
apiKeyID
sql
.
NullInt64
...
@@ -318,11 +375,18 @@ LIMIT 1`
...
@@ -318,11 +375,18 @@ LIMIT 1`
&
out
.
CreatedAt
,
&
out
.
CreatedAt
,
&
out
.
Phase
,
&
out
.
Phase
,
&
out
.
Type
,
&
out
.
Type
,
&
out
.
Owner
,
&
out
.
Source
,
&
out
.
Severity
,
&
out
.
Severity
,
&
statusCode
,
&
statusCode
,
&
out
.
Platform
,
&
out
.
Platform
,
&
out
.
Model
,
&
out
.
Model
,
&
latency
,
&
out
.
IsRetryable
,
&
out
.
RetryCount
,
&
out
.
Resolved
,
&
resolvedAt
,
&
resolvedBy
,
&
resolvedRetryID
,
&
out
.
ClientRequestID
,
&
out
.
ClientRequestID
,
&
out
.
RequestID
,
&
out
.
RequestID
,
&
out
.
Message
,
&
out
.
Message
,
...
@@ -333,9 +397,12 @@ LIMIT 1`
...
@@ -333,9 +397,12 @@ LIMIT 1`
&
out
.
UpstreamErrors
,
&
out
.
UpstreamErrors
,
&
out
.
IsBusinessLimited
,
&
out
.
IsBusinessLimited
,
&
userID
,
&
userID
,
&
out
.
UserEmail
,
&
apiKeyID
,
&
apiKeyID
,
&
accountID
,
&
accountID
,
&
out
.
AccountName
,
&
groupID
,
&
groupID
,
&
out
.
GroupName
,
&
clientIP
,
&
clientIP
,
&
out
.
RequestPath
,
&
out
.
RequestPath
,
&
out
.
Stream
,
&
out
.
Stream
,
...
@@ -355,9 +422,17 @@ LIMIT 1`
...
@@ -355,9 +422,17 @@ LIMIT 1`
}
}
out
.
StatusCode
=
int
(
statusCode
.
Int64
)
out
.
StatusCode
=
int
(
statusCode
.
Int64
)
if
latency
.
Valid
{
if
resolvedAt
.
Valid
{
v
:=
int
(
latency
.
Int64
)
t
:=
resolvedAt
.
Time
out
.
LatencyMs
=
&
v
out
.
ResolvedAt
=
&
t
}
if
resolvedBy
.
Valid
{
v
:=
resolvedBy
.
Int64
out
.
ResolvedByUserID
=
&
v
}
if
resolvedRetryID
.
Valid
{
v
:=
resolvedRetryID
.
Int64
out
.
ResolvedRetryID
=
&
v
}
}
if
clientIP
.
Valid
{
if
clientIP
.
Valid
{
s
:=
clientIP
.
String
s
:=
clientIP
.
String
...
@@ -487,9 +562,15 @@ SET
...
@@ -487,9 +562,15 @@ SET
status = $2,
status = $2,
finished_at = $3,
finished_at = $3,
duration_ms = $4,
duration_ms = $4,
result_request_id = $5,
success = $5,
result_error_id = $6,
http_status_code = $6,
error_message = $7
upstream_request_id = $7,
used_account_id = $8,
response_preview = $9,
response_truncated = $10,
result_request_id = $11,
result_error_id = $12,
error_message = $13
WHERE id = $1`
WHERE id = $1`
_
,
err
:=
r
.
db
.
ExecContext
(
_
,
err
:=
r
.
db
.
ExecContext
(
...
@@ -499,8 +580,14 @@ WHERE id = $1`
...
@@ -499,8 +580,14 @@ WHERE id = $1`
strings
.
TrimSpace
(
input
.
Status
),
strings
.
TrimSpace
(
input
.
Status
),
nullTime
(
input
.
FinishedAt
),
nullTime
(
input
.
FinishedAt
),
input
.
DurationMs
,
input
.
DurationMs
,
nullBool
(
input
.
Success
),
nullInt
(
input
.
HTTPStatusCode
),
opsNullString
(
input
.
UpstreamRequestID
),
nullInt64
(
input
.
UsedAccountID
),
opsNullString
(
input
.
ResponsePreview
),
nullBool
(
input
.
ResponseTruncated
),
opsNullString
(
input
.
ResultRequestID
),
opsNullString
(
input
.
ResultRequestID
),
opsN
ullInt64
(
input
.
ResultErrorID
),
n
ullInt64
(
input
.
ResultErrorID
),
opsNullString
(
input
.
ErrorMessage
),
opsNullString
(
input
.
ErrorMessage
),
)
)
return
err
return
err
...
@@ -526,6 +613,12 @@ SELECT
...
@@ -526,6 +613,12 @@ SELECT
started_at,
started_at,
finished_at,
finished_at,
duration_ms,
duration_ms,
success,
http_status_code,
upstream_request_id,
used_account_id,
response_preview,
response_truncated,
result_request_id,
result_request_id,
result_error_id,
result_error_id,
error_message
error_message
...
@@ -540,6 +633,12 @@ LIMIT 1`
...
@@ -540,6 +633,12 @@ LIMIT 1`
var
startedAt
sql
.
NullTime
var
startedAt
sql
.
NullTime
var
finishedAt
sql
.
NullTime
var
finishedAt
sql
.
NullTime
var
durationMs
sql
.
NullInt64
var
durationMs
sql
.
NullInt64
var
success
sql
.
NullBool
var
httpStatusCode
sql
.
NullInt64
var
upstreamRequestID
sql
.
NullString
var
usedAccountID
sql
.
NullInt64
var
responsePreview
sql
.
NullString
var
responseTruncated
sql
.
NullBool
var
resultRequestID
sql
.
NullString
var
resultRequestID
sql
.
NullString
var
resultErrorID
sql
.
NullInt64
var
resultErrorID
sql
.
NullInt64
var
errorMessage
sql
.
NullString
var
errorMessage
sql
.
NullString
...
@@ -555,6 +654,12 @@ LIMIT 1`
...
@@ -555,6 +654,12 @@ LIMIT 1`
&
startedAt
,
&
startedAt
,
&
finishedAt
,
&
finishedAt
,
&
durationMs
,
&
durationMs
,
&
success
,
&
httpStatusCode
,
&
upstreamRequestID
,
&
usedAccountID
,
&
responsePreview
,
&
responseTruncated
,
&
resultRequestID
,
&
resultRequestID
,
&
resultErrorID
,
&
resultErrorID
,
&
errorMessage
,
&
errorMessage
,
...
@@ -579,6 +684,30 @@ LIMIT 1`
...
@@ -579,6 +684,30 @@ LIMIT 1`
v
:=
durationMs
.
Int64
v
:=
durationMs
.
Int64
out
.
DurationMs
=
&
v
out
.
DurationMs
=
&
v
}
}
if
success
.
Valid
{
v
:=
success
.
Bool
out
.
Success
=
&
v
}
if
httpStatusCode
.
Valid
{
v
:=
int
(
httpStatusCode
.
Int64
)
out
.
HTTPStatusCode
=
&
v
}
if
upstreamRequestID
.
Valid
{
s
:=
upstreamRequestID
.
String
out
.
UpstreamRequestID
=
&
s
}
if
usedAccountID
.
Valid
{
v
:=
usedAccountID
.
Int64
out
.
UsedAccountID
=
&
v
}
if
responsePreview
.
Valid
{
s
:=
responsePreview
.
String
out
.
ResponsePreview
=
&
s
}
if
responseTruncated
.
Valid
{
v
:=
responseTruncated
.
Bool
out
.
ResponseTruncated
=
&
v
}
if
resultRequestID
.
Valid
{
if
resultRequestID
.
Valid
{
s
:=
resultRequestID
.
String
s
:=
resultRequestID
.
String
out
.
ResultRequestID
=
&
s
out
.
ResultRequestID
=
&
s
...
@@ -602,30 +731,234 @@ func nullTime(t time.Time) sql.NullTime {
...
@@ -602,30 +731,234 @@ func nullTime(t time.Time) sql.NullTime {
return
sql
.
NullTime
{
Time
:
t
,
Valid
:
true
}
return
sql
.
NullTime
{
Time
:
t
,
Valid
:
true
}
}
}
func
nullBool
(
v
*
bool
)
sql
.
NullBool
{
if
v
==
nil
{
return
sql
.
NullBool
{}
}
return
sql
.
NullBool
{
Bool
:
*
v
,
Valid
:
true
}
}
func
(
r
*
opsRepository
)
ListRetryAttemptsByErrorID
(
ctx
context
.
Context
,
sourceErrorID
int64
,
limit
int
)
([]
*
service
.
OpsRetryAttempt
,
error
)
{
if
r
==
nil
||
r
.
db
==
nil
{
return
nil
,
fmt
.
Errorf
(
"nil ops repository"
)
}
if
sourceErrorID
<=
0
{
return
nil
,
fmt
.
Errorf
(
"invalid source_error_id"
)
}
if
limit
<=
0
{
limit
=
50
}
if
limit
>
200
{
limit
=
200
}
q
:=
`
SELECT
r.id,
r.created_at,
COALESCE(r.requested_by_user_id, 0),
r.source_error_id,
COALESCE(r.mode, ''),
r.pinned_account_id,
COALESCE(pa.name, ''),
COALESCE(r.status, ''),
r.started_at,
r.finished_at,
r.duration_ms,
r.success,
r.http_status_code,
r.upstream_request_id,
r.used_account_id,
COALESCE(ua.name, ''),
r.response_preview,
r.response_truncated,
r.result_request_id,
r.result_error_id,
r.error_message
FROM ops_retry_attempts r
LEFT JOIN accounts pa ON r.pinned_account_id = pa.id
LEFT JOIN accounts ua ON r.used_account_id = ua.id
WHERE r.source_error_id = $1
ORDER BY r.created_at DESC
LIMIT $2`
rows
,
err
:=
r
.
db
.
QueryContext
(
ctx
,
q
,
sourceErrorID
,
limit
)
if
err
!=
nil
{
return
nil
,
err
}
defer
func
()
{
_
=
rows
.
Close
()
}()
out
:=
make
([]
*
service
.
OpsRetryAttempt
,
0
,
16
)
for
rows
.
Next
()
{
var
item
service
.
OpsRetryAttempt
var
pinnedAccountID
sql
.
NullInt64
var
pinnedAccountName
string
var
requestedBy
sql
.
NullInt64
var
startedAt
sql
.
NullTime
var
finishedAt
sql
.
NullTime
var
durationMs
sql
.
NullInt64
var
success
sql
.
NullBool
var
httpStatusCode
sql
.
NullInt64
var
upstreamRequestID
sql
.
NullString
var
usedAccountID
sql
.
NullInt64
var
usedAccountName
string
var
responsePreview
sql
.
NullString
var
responseTruncated
sql
.
NullBool
var
resultRequestID
sql
.
NullString
var
resultErrorID
sql
.
NullInt64
var
errorMessage
sql
.
NullString
if
err
:=
rows
.
Scan
(
&
item
.
ID
,
&
item
.
CreatedAt
,
&
requestedBy
,
&
item
.
SourceErrorID
,
&
item
.
Mode
,
&
pinnedAccountID
,
&
pinnedAccountName
,
&
item
.
Status
,
&
startedAt
,
&
finishedAt
,
&
durationMs
,
&
success
,
&
httpStatusCode
,
&
upstreamRequestID
,
&
usedAccountID
,
&
usedAccountName
,
&
responsePreview
,
&
responseTruncated
,
&
resultRequestID
,
&
resultErrorID
,
&
errorMessage
,
);
err
!=
nil
{
return
nil
,
err
}
item
.
RequestedByUserID
=
requestedBy
.
Int64
if
pinnedAccountID
.
Valid
{
v
:=
pinnedAccountID
.
Int64
item
.
PinnedAccountID
=
&
v
}
item
.
PinnedAccountName
=
pinnedAccountName
if
startedAt
.
Valid
{
t
:=
startedAt
.
Time
item
.
StartedAt
=
&
t
}
if
finishedAt
.
Valid
{
t
:=
finishedAt
.
Time
item
.
FinishedAt
=
&
t
}
if
durationMs
.
Valid
{
v
:=
durationMs
.
Int64
item
.
DurationMs
=
&
v
}
if
success
.
Valid
{
v
:=
success
.
Bool
item
.
Success
=
&
v
}
if
httpStatusCode
.
Valid
{
v
:=
int
(
httpStatusCode
.
Int64
)
item
.
HTTPStatusCode
=
&
v
}
if
upstreamRequestID
.
Valid
{
item
.
UpstreamRequestID
=
&
upstreamRequestID
.
String
}
if
usedAccountID
.
Valid
{
v
:=
usedAccountID
.
Int64
item
.
UsedAccountID
=
&
v
}
item
.
UsedAccountName
=
usedAccountName
if
responsePreview
.
Valid
{
item
.
ResponsePreview
=
&
responsePreview
.
String
}
if
responseTruncated
.
Valid
{
v
:=
responseTruncated
.
Bool
item
.
ResponseTruncated
=
&
v
}
if
resultRequestID
.
Valid
{
item
.
ResultRequestID
=
&
resultRequestID
.
String
}
if
resultErrorID
.
Valid
{
v
:=
resultErrorID
.
Int64
item
.
ResultErrorID
=
&
v
}
if
errorMessage
.
Valid
{
item
.
ErrorMessage
=
&
errorMessage
.
String
}
out
=
append
(
out
,
&
item
)
}
if
err
:=
rows
.
Err
();
err
!=
nil
{
return
nil
,
err
}
return
out
,
nil
}
func
(
r
*
opsRepository
)
UpdateErrorResolution
(
ctx
context
.
Context
,
errorID
int64
,
resolved
bool
,
resolvedByUserID
*
int64
,
resolvedRetryID
*
int64
,
resolvedAt
*
time
.
Time
)
error
{
if
r
==
nil
||
r
.
db
==
nil
{
return
fmt
.
Errorf
(
"nil ops repository"
)
}
if
errorID
<=
0
{
return
fmt
.
Errorf
(
"invalid error id"
)
}
q
:=
`
UPDATE ops_error_logs
SET
resolved = $2,
resolved_at = $3,
resolved_by_user_id = $4,
resolved_retry_id = $5
WHERE id = $1`
at
:=
sql
.
NullTime
{}
if
resolvedAt
!=
nil
&&
!
resolvedAt
.
IsZero
()
{
at
=
sql
.
NullTime
{
Time
:
resolvedAt
.
UTC
(),
Valid
:
true
}
}
else
if
resolved
{
now
:=
time
.
Now
()
.
UTC
()
at
=
sql
.
NullTime
{
Time
:
now
,
Valid
:
true
}
}
_
,
err
:=
r
.
db
.
ExecContext
(
ctx
,
q
,
errorID
,
resolved
,
at
,
nullInt64
(
resolvedByUserID
),
nullInt64
(
resolvedRetryID
),
)
return
err
}
func
buildOpsErrorLogsWhere
(
filter
*
service
.
OpsErrorLogFilter
)
(
string
,
[]
any
)
{
func
buildOpsErrorLogsWhere
(
filter
*
service
.
OpsErrorLogFilter
)
(
string
,
[]
any
)
{
clauses
:=
make
([]
string
,
0
,
8
)
clauses
:=
make
([]
string
,
0
,
12
)
args
:=
make
([]
any
,
0
,
8
)
args
:=
make
([]
any
,
0
,
12
)
clauses
=
append
(
clauses
,
"1=1"
)
clauses
=
append
(
clauses
,
"1=1"
)
phaseFilter
:=
""
phaseFilter
:=
""
if
filter
!=
nil
{
if
filter
!=
nil
{
phaseFilter
=
strings
.
TrimSpace
(
strings
.
ToLower
(
filter
.
Phase
))
phaseFilter
=
strings
.
TrimSpace
(
strings
.
ToLower
(
filter
.
Phase
))
}
}
// ops_error_logs
primarily
stores client-visible error requests (status>=400),
// ops_error_logs stores client-visible error requests (status>=400),
// but we also persist "recovered" upstream errors (status<400) for upstream health visibility.
// but we also persist "recovered" upstream errors (status<400) for upstream health visibility.
// By default, keep list endpoints scoped to client errors unless explicitly filtering upstream phase.
// If Resolved is not specified, do not filter by resolved state (backward-compatible).
resolvedFilter
:=
(
*
bool
)(
nil
)
if
filter
!=
nil
{
resolvedFilter
=
filter
.
Resolved
}
// Keep list endpoints scoped to client errors unless explicitly filtering upstream phase.
if
phaseFilter
!=
"upstream"
{
if
phaseFilter
!=
"upstream"
{
clauses
=
append
(
clauses
,
"COALESCE(status_code, 0) >= 400"
)
clauses
=
append
(
clauses
,
"COALESCE(status_code, 0) >= 400"
)
}
}
if
filter
.
StartTime
!=
nil
&&
!
filter
.
StartTime
.
IsZero
()
{
if
filter
.
StartTime
!=
nil
&&
!
filter
.
StartTime
.
IsZero
()
{
args
=
append
(
args
,
filter
.
StartTime
.
UTC
())
args
=
append
(
args
,
filter
.
StartTime
.
UTC
())
clauses
=
append
(
clauses
,
"created_at >= $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"
e.
created_at >= $"
+
itoa
(
len
(
args
)))
}
}
if
filter
.
EndTime
!=
nil
&&
!
filter
.
EndTime
.
IsZero
()
{
if
filter
.
EndTime
!=
nil
&&
!
filter
.
EndTime
.
IsZero
()
{
args
=
append
(
args
,
filter
.
EndTime
.
UTC
())
args
=
append
(
args
,
filter
.
EndTime
.
UTC
())
// Keep time-window semantics consistent with other ops queries: [start, end)
// Keep time-window semantics consistent with other ops queries: [start, end)
clauses
=
append
(
clauses
,
"created_at < $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"
e.
created_at < $"
+
itoa
(
len
(
args
)))
}
}
if
p
:=
strings
.
TrimSpace
(
filter
.
Platform
);
p
!=
""
{
if
p
:=
strings
.
TrimSpace
(
filter
.
Platform
);
p
!=
""
{
args
=
append
(
args
,
p
)
args
=
append
(
args
,
p
)
...
@@ -643,10 +976,59 @@ func buildOpsErrorLogsWhere(filter *service.OpsErrorLogFilter) (string, []any) {
...
@@ -643,10 +976,59 @@ func buildOpsErrorLogsWhere(filter *service.OpsErrorLogFilter) (string, []any) {
args
=
append
(
args
,
phase
)
args
=
append
(
args
,
phase
)
clauses
=
append
(
clauses
,
"error_phase = $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"error_phase = $"
+
itoa
(
len
(
args
)))
}
}
if
filter
!=
nil
{
if
owner
:=
strings
.
TrimSpace
(
strings
.
ToLower
(
filter
.
Owner
));
owner
!=
""
{
args
=
append
(
args
,
owner
)
clauses
=
append
(
clauses
,
"LOWER(COALESCE(error_owner,'')) = $"
+
itoa
(
len
(
args
)))
}
if
source
:=
strings
.
TrimSpace
(
strings
.
ToLower
(
filter
.
Source
));
source
!=
""
{
args
=
append
(
args
,
source
)
clauses
=
append
(
clauses
,
"LOWER(COALESCE(error_source,'')) = $"
+
itoa
(
len
(
args
)))
}
}
if
resolvedFilter
!=
nil
{
args
=
append
(
args
,
*
resolvedFilter
)
clauses
=
append
(
clauses
,
"COALESCE(resolved,false) = $"
+
itoa
(
len
(
args
)))
}
// View filter: errors vs excluded vs all.
// Excluded = upstream 429/529 and business-limited (quota/concurrency/billing) errors.
view
:=
""
if
filter
!=
nil
{
view
=
strings
.
ToLower
(
strings
.
TrimSpace
(
filter
.
View
))
}
switch
view
{
case
""
,
"errors"
:
clauses
=
append
(
clauses
,
"COALESCE(is_business_limited,false) = false"
)
clauses
=
append
(
clauses
,
"COALESCE(upstream_status_code, status_code, 0) NOT IN (429, 529)"
)
case
"excluded"
:
clauses
=
append
(
clauses
,
"(COALESCE(is_business_limited,false) = true OR COALESCE(upstream_status_code, status_code, 0) IN (429, 529))"
)
case
"all"
:
// no-op
default
:
// treat unknown as default 'errors'
clauses
=
append
(
clauses
,
"COALESCE(is_business_limited,false) = false"
)
clauses
=
append
(
clauses
,
"COALESCE(upstream_status_code, status_code, 0) NOT IN (429, 529)"
)
}
if
len
(
filter
.
StatusCodes
)
>
0
{
if
len
(
filter
.
StatusCodes
)
>
0
{
args
=
append
(
args
,
pq
.
Array
(
filter
.
StatusCodes
))
args
=
append
(
args
,
pq
.
Array
(
filter
.
StatusCodes
))
clauses
=
append
(
clauses
,
"COALESCE(upstream_status_code, status_code, 0) = ANY($"
+
itoa
(
len
(
args
))
+
")"
)
clauses
=
append
(
clauses
,
"COALESCE(upstream_status_code, status_code, 0) = ANY($"
+
itoa
(
len
(
args
))
+
")"
)
}
else
if
filter
.
StatusCodesOther
{
// "Other" means: status codes not in the common list.
known
:=
[]
int
{
400
,
401
,
403
,
404
,
409
,
422
,
429
,
500
,
502
,
503
,
504
,
529
}
args
=
append
(
args
,
pq
.
Array
(
known
))
clauses
=
append
(
clauses
,
"NOT (COALESCE(upstream_status_code, status_code, 0) = ANY($"
+
itoa
(
len
(
args
))
+
"))"
)
}
}
// Exact correlation keys (preferred for request↔upstream linkage).
if
rid
:=
strings
.
TrimSpace
(
filter
.
RequestID
);
rid
!=
""
{
args
=
append
(
args
,
rid
)
clauses
=
append
(
clauses
,
"COALESCE(request_id,'') = $"
+
itoa
(
len
(
args
)))
}
if
crid
:=
strings
.
TrimSpace
(
filter
.
ClientRequestID
);
crid
!=
""
{
args
=
append
(
args
,
crid
)
clauses
=
append
(
clauses
,
"COALESCE(client_request_id,'') = $"
+
itoa
(
len
(
args
)))
}
if
q
:=
strings
.
TrimSpace
(
filter
.
Query
);
q
!=
""
{
if
q
:=
strings
.
TrimSpace
(
filter
.
Query
);
q
!=
""
{
like
:=
"%"
+
q
+
"%"
like
:=
"%"
+
q
+
"%"
args
=
append
(
args
,
like
)
args
=
append
(
args
,
like
)
...
@@ -654,6 +1036,13 @@ func buildOpsErrorLogsWhere(filter *service.OpsErrorLogFilter) (string, []any) {
...
@@ -654,6 +1036,13 @@ func buildOpsErrorLogsWhere(filter *service.OpsErrorLogFilter) (string, []any) {
clauses
=
append
(
clauses
,
"(request_id ILIKE $"
+
n
+
" OR client_request_id ILIKE $"
+
n
+
" OR error_message ILIKE $"
+
n
+
")"
)
clauses
=
append
(
clauses
,
"(request_id ILIKE $"
+
n
+
" OR client_request_id ILIKE $"
+
n
+
" OR error_message ILIKE $"
+
n
+
")"
)
}
}
if
userQuery
:=
strings
.
TrimSpace
(
filter
.
UserQuery
);
userQuery
!=
""
{
like
:=
"%"
+
userQuery
+
"%"
args
=
append
(
args
,
like
)
n
:=
itoa
(
len
(
args
))
clauses
=
append
(
clauses
,
"u.email ILIKE $"
+
n
)
}
return
"WHERE "
+
strings
.
Join
(
clauses
,
" AND "
),
args
return
"WHERE "
+
strings
.
Join
(
clauses
,
" AND "
),
args
}
}
...
...
backend/internal/repository/ops_repo_alerts.go
View file @
27214f86
...
@@ -354,7 +354,7 @@ SELECT
...
@@ -354,7 +354,7 @@ SELECT
created_at
created_at
FROM ops_alert_events
FROM ops_alert_events
`
+
where
+
`
`
+
where
+
`
ORDER BY fired_at DESC
ORDER BY fired_at DESC
, id DESC
LIMIT `
+
limitArg
LIMIT `
+
limitArg
rows
,
err
:=
r
.
db
.
QueryContext
(
ctx
,
q
,
args
...
)
rows
,
err
:=
r
.
db
.
QueryContext
(
ctx
,
q
,
args
...
)
...
@@ -413,6 +413,43 @@ LIMIT ` + limitArg
...
@@ -413,6 +413,43 @@ LIMIT ` + limitArg
return
out
,
nil
return
out
,
nil
}
}
func
(
r
*
opsRepository
)
GetAlertEventByID
(
ctx
context
.
Context
,
eventID
int64
)
(
*
service
.
OpsAlertEvent
,
error
)
{
if
r
==
nil
||
r
.
db
==
nil
{
return
nil
,
fmt
.
Errorf
(
"nil ops repository"
)
}
if
eventID
<=
0
{
return
nil
,
fmt
.
Errorf
(
"invalid event id"
)
}
q
:=
`
SELECT
id,
COALESCE(rule_id, 0),
COALESCE(severity, ''),
COALESCE(status, ''),
COALESCE(title, ''),
COALESCE(description, ''),
metric_value,
threshold_value,
dimensions,
fired_at,
resolved_at,
email_sent,
created_at
FROM ops_alert_events
WHERE id = $1`
row
:=
r
.
db
.
QueryRowContext
(
ctx
,
q
,
eventID
)
ev
,
err
:=
scanOpsAlertEvent
(
row
)
if
err
!=
nil
{
if
err
==
sql
.
ErrNoRows
{
return
nil
,
nil
}
return
nil
,
err
}
return
ev
,
nil
}
func
(
r
*
opsRepository
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
service
.
OpsAlertEvent
,
error
)
{
func
(
r
*
opsRepository
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
service
.
OpsAlertEvent
,
error
)
{
if
r
==
nil
||
r
.
db
==
nil
{
if
r
==
nil
||
r
.
db
==
nil
{
return
nil
,
fmt
.
Errorf
(
"nil ops repository"
)
return
nil
,
fmt
.
Errorf
(
"nil ops repository"
)
...
@@ -591,6 +628,121 @@ type opsAlertEventRow interface {
...
@@ -591,6 +628,121 @@ type opsAlertEventRow interface {
Scan
(
dest
...
any
)
error
Scan
(
dest
...
any
)
error
}
}
func
(
r
*
opsRepository
)
CreateAlertSilence
(
ctx
context
.
Context
,
input
*
service
.
OpsAlertSilence
)
(
*
service
.
OpsAlertSilence
,
error
)
{
if
r
==
nil
||
r
.
db
==
nil
{
return
nil
,
fmt
.
Errorf
(
"nil ops repository"
)
}
if
input
==
nil
{
return
nil
,
fmt
.
Errorf
(
"nil input"
)
}
if
input
.
RuleID
<=
0
{
return
nil
,
fmt
.
Errorf
(
"invalid rule_id"
)
}
platform
:=
strings
.
TrimSpace
(
input
.
Platform
)
if
platform
==
""
{
return
nil
,
fmt
.
Errorf
(
"invalid platform"
)
}
if
input
.
Until
.
IsZero
()
{
return
nil
,
fmt
.
Errorf
(
"invalid until"
)
}
q
:=
`
INSERT INTO ops_alert_silences (
rule_id,
platform,
group_id,
region,
until,
reason,
created_by,
created_at
) VALUES (
$1,$2,$3,$4,$5,$6,$7,NOW()
)
RETURNING id, rule_id, platform, group_id, region, until, COALESCE(reason,''), created_by, created_at`
row
:=
r
.
db
.
QueryRowContext
(
ctx
,
q
,
input
.
RuleID
,
platform
,
opsNullInt64
(
input
.
GroupID
),
opsNullString
(
input
.
Region
),
input
.
Until
,
opsNullString
(
input
.
Reason
),
opsNullInt64
(
input
.
CreatedBy
),
)
var
out
service
.
OpsAlertSilence
var
groupID
sql
.
NullInt64
var
region
sql
.
NullString
var
createdBy
sql
.
NullInt64
if
err
:=
row
.
Scan
(
&
out
.
ID
,
&
out
.
RuleID
,
&
out
.
Platform
,
&
groupID
,
&
region
,
&
out
.
Until
,
&
out
.
Reason
,
&
createdBy
,
&
out
.
CreatedAt
,
);
err
!=
nil
{
return
nil
,
err
}
if
groupID
.
Valid
{
v
:=
groupID
.
Int64
out
.
GroupID
=
&
v
}
if
region
.
Valid
{
v
:=
strings
.
TrimSpace
(
region
.
String
)
if
v
!=
""
{
out
.
Region
=
&
v
}
}
if
createdBy
.
Valid
{
v
:=
createdBy
.
Int64
out
.
CreatedBy
=
&
v
}
return
&
out
,
nil
}
func
(
r
*
opsRepository
)
IsAlertSilenced
(
ctx
context
.
Context
,
ruleID
int64
,
platform
string
,
groupID
*
int64
,
region
*
string
,
now
time
.
Time
)
(
bool
,
error
)
{
if
r
==
nil
||
r
.
db
==
nil
{
return
false
,
fmt
.
Errorf
(
"nil ops repository"
)
}
if
ruleID
<=
0
{
return
false
,
fmt
.
Errorf
(
"invalid rule id"
)
}
platform
=
strings
.
TrimSpace
(
platform
)
if
platform
==
""
{
return
false
,
nil
}
if
now
.
IsZero
()
{
now
=
time
.
Now
()
.
UTC
()
}
q
:=
`
SELECT 1
FROM ops_alert_silences
WHERE rule_id = $1
AND platform = $2
AND (group_id IS NOT DISTINCT FROM $3)
AND (region IS NOT DISTINCT FROM $4)
AND until > $5
LIMIT 1`
var
dummy
int
err
:=
r
.
db
.
QueryRowContext
(
ctx
,
q
,
ruleID
,
platform
,
opsNullInt64
(
groupID
),
opsNullString
(
region
),
now
)
.
Scan
(
&
dummy
)
if
err
!=
nil
{
if
err
==
sql
.
ErrNoRows
{
return
false
,
nil
}
return
false
,
err
}
return
true
,
nil
}
func
scanOpsAlertEvent
(
row
opsAlertEventRow
)
(
*
service
.
OpsAlertEvent
,
error
)
{
func
scanOpsAlertEvent
(
row
opsAlertEventRow
)
(
*
service
.
OpsAlertEvent
,
error
)
{
var
ev
service
.
OpsAlertEvent
var
ev
service
.
OpsAlertEvent
var
metricValue
sql
.
NullFloat64
var
metricValue
sql
.
NullFloat64
...
@@ -652,6 +804,10 @@ func buildOpsAlertEventsWhere(filter *service.OpsAlertEventFilter) (string, []an
...
@@ -652,6 +804,10 @@ func buildOpsAlertEventsWhere(filter *service.OpsAlertEventFilter) (string, []an
args
=
append
(
args
,
severity
)
args
=
append
(
args
,
severity
)
clauses
=
append
(
clauses
,
"severity = $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"severity = $"
+
itoa
(
len
(
args
)))
}
}
if
filter
.
EmailSent
!=
nil
{
args
=
append
(
args
,
*
filter
.
EmailSent
)
clauses
=
append
(
clauses
,
"email_sent = $"
+
itoa
(
len
(
args
)))
}
if
filter
.
StartTime
!=
nil
&&
!
filter
.
StartTime
.
IsZero
()
{
if
filter
.
StartTime
!=
nil
&&
!
filter
.
StartTime
.
IsZero
()
{
args
=
append
(
args
,
*
filter
.
StartTime
)
args
=
append
(
args
,
*
filter
.
StartTime
)
clauses
=
append
(
clauses
,
"fired_at >= $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"fired_at >= $"
+
itoa
(
len
(
args
)))
...
@@ -661,6 +817,14 @@ func buildOpsAlertEventsWhere(filter *service.OpsAlertEventFilter) (string, []an
...
@@ -661,6 +817,14 @@ func buildOpsAlertEventsWhere(filter *service.OpsAlertEventFilter) (string, []an
clauses
=
append
(
clauses
,
"fired_at < $"
+
itoa
(
len
(
args
)))
clauses
=
append
(
clauses
,
"fired_at < $"
+
itoa
(
len
(
args
)))
}
}
// Cursor pagination (descending by fired_at, then id)
if
filter
.
BeforeFiredAt
!=
nil
&&
!
filter
.
BeforeFiredAt
.
IsZero
()
&&
filter
.
BeforeID
!=
nil
&&
*
filter
.
BeforeID
>
0
{
args
=
append
(
args
,
*
filter
.
BeforeFiredAt
)
tsArg
:=
"$"
+
itoa
(
len
(
args
))
args
=
append
(
args
,
*
filter
.
BeforeID
)
idArg
:=
"$"
+
itoa
(
len
(
args
))
clauses
=
append
(
clauses
,
fmt
.
Sprintf
(
"(fired_at < %s OR (fired_at = %s AND id < %s))"
,
tsArg
,
tsArg
,
idArg
))
}
// Dimensions are stored in JSONB. We filter best-effort without requiring GIN indexes.
// Dimensions are stored in JSONB. We filter best-effort without requiring GIN indexes.
if
platform
:=
strings
.
TrimSpace
(
filter
.
Platform
);
platform
!=
""
{
if
platform
:=
strings
.
TrimSpace
(
filter
.
Platform
);
platform
!=
""
{
args
=
append
(
args
,
platform
)
args
=
append
(
args
,
platform
)
...
...
backend/internal/repository/scheduler_snapshot_outbox_integration_test.go
View file @
27214f86
...
@@ -27,7 +27,7 @@ func TestSchedulerSnapshotOutboxReplay(t *testing.T) {
...
@@ -27,7 +27,7 @@ func TestSchedulerSnapshotOutboxReplay(t *testing.T) {
RunMode
:
config
.
RunModeStandard
,
RunMode
:
config
.
RunModeStandard
,
Gateway
:
config
.
GatewayConfig
{
Gateway
:
config
.
GatewayConfig
{
Scheduling
:
config
.
GatewaySchedulingConfig
{
Scheduling
:
config
.
GatewaySchedulingConfig
{
OutboxPollIntervalSeconds
:
1
,
OutboxPollIntervalSeconds
:
1
,
FullRebuildIntervalSeconds
:
0
,
FullRebuildIntervalSeconds
:
0
,
DbFallbackEnabled
:
true
,
DbFallbackEnabled
:
true
,
},
},
...
...
backend/internal/server/routes/admin.go
View file @
27214f86
...
@@ -81,6 +81,9 @@ func registerOpsRoutes(admin *gin.RouterGroup, h *handler.Handlers) {
...
@@ -81,6 +81,9 @@ func registerOpsRoutes(admin *gin.RouterGroup, h *handler.Handlers) {
ops
.
PUT
(
"/alert-rules/:id"
,
h
.
Admin
.
Ops
.
UpdateAlertRule
)
ops
.
PUT
(
"/alert-rules/:id"
,
h
.
Admin
.
Ops
.
UpdateAlertRule
)
ops
.
DELETE
(
"/alert-rules/:id"
,
h
.
Admin
.
Ops
.
DeleteAlertRule
)
ops
.
DELETE
(
"/alert-rules/:id"
,
h
.
Admin
.
Ops
.
DeleteAlertRule
)
ops
.
GET
(
"/alert-events"
,
h
.
Admin
.
Ops
.
ListAlertEvents
)
ops
.
GET
(
"/alert-events"
,
h
.
Admin
.
Ops
.
ListAlertEvents
)
ops
.
GET
(
"/alert-events/:id"
,
h
.
Admin
.
Ops
.
GetAlertEvent
)
ops
.
PUT
(
"/alert-events/:id/status"
,
h
.
Admin
.
Ops
.
UpdateAlertEventStatus
)
ops
.
POST
(
"/alert-silences"
,
h
.
Admin
.
Ops
.
CreateAlertSilence
)
// Email notification config (DB-backed)
// Email notification config (DB-backed)
ops
.
GET
(
"/email-notification/config"
,
h
.
Admin
.
Ops
.
GetEmailNotificationConfig
)
ops
.
GET
(
"/email-notification/config"
,
h
.
Admin
.
Ops
.
GetEmailNotificationConfig
)
...
@@ -110,10 +113,26 @@ func registerOpsRoutes(admin *gin.RouterGroup, h *handler.Handlers) {
...
@@ -110,10 +113,26 @@ func registerOpsRoutes(admin *gin.RouterGroup, h *handler.Handlers) {
ws
.
GET
(
"/qps"
,
h
.
Admin
.
Ops
.
QPSWSHandler
)
ws
.
GET
(
"/qps"
,
h
.
Admin
.
Ops
.
QPSWSHandler
)
}
}
// Error logs (
MVP-1
)
// Error logs (
legacy
)
ops
.
GET
(
"/errors"
,
h
.
Admin
.
Ops
.
GetErrorLogs
)
ops
.
GET
(
"/errors"
,
h
.
Admin
.
Ops
.
GetErrorLogs
)
ops
.
GET
(
"/errors/:id"
,
h
.
Admin
.
Ops
.
GetErrorLogByID
)
ops
.
GET
(
"/errors/:id"
,
h
.
Admin
.
Ops
.
GetErrorLogByID
)
ops
.
GET
(
"/errors/:id/retries"
,
h
.
Admin
.
Ops
.
ListRetryAttempts
)
ops
.
POST
(
"/errors/:id/retry"
,
h
.
Admin
.
Ops
.
RetryErrorRequest
)
ops
.
POST
(
"/errors/:id/retry"
,
h
.
Admin
.
Ops
.
RetryErrorRequest
)
ops
.
PUT
(
"/errors/:id/resolve"
,
h
.
Admin
.
Ops
.
UpdateErrorResolution
)
// Request errors (client-visible failures)
ops
.
GET
(
"/request-errors"
,
h
.
Admin
.
Ops
.
ListRequestErrors
)
ops
.
GET
(
"/request-errors/:id"
,
h
.
Admin
.
Ops
.
GetRequestError
)
ops
.
GET
(
"/request-errors/:id/upstream-errors"
,
h
.
Admin
.
Ops
.
ListRequestErrorUpstreamErrors
)
ops
.
POST
(
"/request-errors/:id/retry-client"
,
h
.
Admin
.
Ops
.
RetryRequestErrorClient
)
ops
.
POST
(
"/request-errors/:id/upstream-errors/:idx/retry"
,
h
.
Admin
.
Ops
.
RetryRequestErrorUpstreamEvent
)
ops
.
PUT
(
"/request-errors/:id/resolve"
,
h
.
Admin
.
Ops
.
ResolveRequestError
)
// Upstream errors (independent upstream failures)
ops
.
GET
(
"/upstream-errors"
,
h
.
Admin
.
Ops
.
ListUpstreamErrors
)
ops
.
GET
(
"/upstream-errors/:id"
,
h
.
Admin
.
Ops
.
GetUpstreamError
)
ops
.
POST
(
"/upstream-errors/:id/retry"
,
h
.
Admin
.
Ops
.
RetryUpstreamError
)
ops
.
PUT
(
"/upstream-errors/:id/resolve"
,
h
.
Admin
.
Ops
.
ResolveUpstreamError
)
// Request drilldown (success + error)
// Request drilldown (success + error)
ops
.
GET
(
"/requests"
,
h
.
Admin
.
Ops
.
ListRequestDetails
)
ops
.
GET
(
"/requests"
,
h
.
Admin
.
Ops
.
ListRequestDetails
)
...
...
backend/internal/service/admin_service_bulk_update_test.go
View file @
27214f86
...
@@ -12,9 +12,9 @@ import (
...
@@ -12,9 +12,9 @@ import (
type
accountRepoStubForBulkUpdate
struct
{
type
accountRepoStubForBulkUpdate
struct
{
accountRepoStub
accountRepoStub
bulkUpdateErr
error
bulkUpdateErr
error
bulkUpdateIDs
[]
int64
bulkUpdateIDs
[]
int64
bindGroupErrByID
map
[
int64
]
error
bindGroupErrByID
map
[
int64
]
error
}
}
func
(
s
*
accountRepoStubForBulkUpdate
)
BulkUpdate
(
_
context
.
Context
,
ids
[]
int64
,
_
AccountBulkUpdate
)
(
int64
,
error
)
{
func
(
s
*
accountRepoStubForBulkUpdate
)
BulkUpdate
(
_
context
.
Context
,
ids
[]
int64
,
_
AccountBulkUpdate
)
(
int64
,
error
)
{
...
...
backend/internal/service/antigravity_gateway_service.go
View file @
27214f86
...
@@ -564,6 +564,10 @@ urlFallbackLoop:
...
@@ -564,6 +564,10 @@ urlFallbackLoop:
}
}
upstreamReq
,
err
:=
antigravity
.
NewAPIRequestWithURL
(
ctx
,
baseURL
,
action
,
accessToken
,
geminiBody
)
upstreamReq
,
err
:=
antigravity
.
NewAPIRequestWithURL
(
ctx
,
baseURL
,
action
,
accessToken
,
geminiBody
)
// Capture upstream request body for ops retry of this attempt.
if
c
!=
nil
{
c
.
Set
(
OpsUpstreamRequestBodyKey
,
string
(
geminiBody
))
}
if
err
!=
nil
{
if
err
!=
nil
{
return
nil
,
err
return
nil
,
err
}
}
...
@@ -574,6 +578,7 @@ urlFallbackLoop:
...
@@ -574,6 +578,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -615,6 +620,7 @@ urlFallbackLoop:
...
@@ -615,6 +620,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -645,6 +651,7 @@ urlFallbackLoop:
...
@@ -645,6 +651,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -697,6 +704,7 @@ urlFallbackLoop:
...
@@ -697,6 +704,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"signature_error"
,
Kind
:
"signature_error"
,
...
@@ -740,6 +748,7 @@ urlFallbackLoop:
...
@@ -740,6 +748,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"signature_retry_request_error"
,
Kind
:
"signature_retry_request_error"
,
Message
:
sanitizeUpstreamErrorMessage
(
retryErr
.
Error
()),
Message
:
sanitizeUpstreamErrorMessage
(
retryErr
.
Error
()),
...
@@ -770,6 +779,7 @@ urlFallbackLoop:
...
@@ -770,6 +779,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
retryResp
.
StatusCode
,
UpstreamStatusCode
:
retryResp
.
StatusCode
,
UpstreamRequestID
:
retryResp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
retryResp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
kind
,
Kind
:
kind
,
...
@@ -817,6 +827,7 @@ urlFallbackLoop:
...
@@ -817,6 +827,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -1371,6 +1382,7 @@ urlFallbackLoop:
...
@@ -1371,6 +1382,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -1412,6 +1424,7 @@ urlFallbackLoop:
...
@@ -1412,6 +1424,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -1442,6 +1455,7 @@ urlFallbackLoop:
...
@@ -1442,6 +1455,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -1543,6 +1557,7 @@ urlFallbackLoop:
...
@@ -1543,6 +1557,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
requestID
,
UpstreamRequestID
:
requestID
,
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -1559,6 +1574,7 @@ urlFallbackLoop:
...
@@ -1559,6 +1574,7 @@ urlFallbackLoop:
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
requestID
,
UpstreamRequestID
:
requestID
,
Kind
:
"http_error"
,
Kind
:
"http_error"
,
...
@@ -2039,6 +2055,7 @@ func (s *AntigravityGatewayService) writeMappedClaudeError(c *gin.Context, accou
...
@@ -2039,6 +2055,7 @@ func (s *AntigravityGatewayService) writeMappedClaudeError(c *gin.Context, accou
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
upstreamStatus
,
UpstreamStatusCode
:
upstreamStatus
,
UpstreamRequestID
:
upstreamRequestID
,
UpstreamRequestID
:
upstreamRequestID
,
Kind
:
"http_error"
,
Kind
:
"http_error"
,
...
...
backend/internal/service/gateway_service.go
View file @
27214f86
...
@@ -1466,6 +1466,9 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1466,6 +1466,9 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
for
attempt
:=
1
;
attempt
<=
maxRetryAttempts
;
attempt
++
{
for
attempt
:=
1
;
attempt
<=
maxRetryAttempts
;
attempt
++
{
// 构建上游请求(每次重试需要重新构建,因为请求体需要重新读取)
// 构建上游请求(每次重试需要重新构建,因为请求体需要重新读取)
upstreamReq
,
err
:=
s
.
buildUpstreamRequest
(
ctx
,
c
,
account
,
body
,
token
,
tokenType
,
reqModel
)
upstreamReq
,
err
:=
s
.
buildUpstreamRequest
(
ctx
,
c
,
account
,
body
,
token
,
tokenType
,
reqModel
)
// Capture upstream request body for ops retry of this attempt.
c
.
Set
(
OpsUpstreamRequestBodyKey
,
string
(
body
))
if
err
!=
nil
{
if
err
!=
nil
{
return
nil
,
err
return
nil
,
err
}
}
...
@@ -1482,6 +1485,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1482,6 +1485,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -1506,6 +1510,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1506,6 +1510,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"signature_error"
,
Kind
:
"signature_error"
,
...
@@ -1557,6 +1562,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1557,6 +1562,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
retryResp
.
StatusCode
,
UpstreamStatusCode
:
retryResp
.
StatusCode
,
UpstreamRequestID
:
retryResp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
retryResp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"signature_retry_thinking"
,
Kind
:
"signature_retry_thinking"
,
...
@@ -1585,6 +1591,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1585,6 +1591,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"signature_retry_tools_request_error"
,
Kind
:
"signature_retry_tools_request_error"
,
Message
:
sanitizeUpstreamErrorMessage
(
retryErr2
.
Error
()),
Message
:
sanitizeUpstreamErrorMessage
(
retryErr2
.
Error
()),
...
@@ -1643,6 +1650,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1643,6 +1650,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -1691,6 +1699,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1691,6 +1699,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"retry_exhausted_failover"
,
Kind
:
"retry_exhausted_failover"
,
...
@@ -1757,6 +1766,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
...
@@ -1757,6 +1766,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"failover_on_400"
,
Kind
:
"failover_on_400"
,
...
...
backend/internal/service/gemini_messages_compat_service.go
View file @
27214f86
...
@@ -545,12 +545,19 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
...
@@ -545,12 +545,19 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
}
}
requestIDHeader
=
idHeader
requestIDHeader
=
idHeader
// Capture upstream request body for ops retry of this attempt.
if
c
!=
nil
{
// In this code path `body` is already the JSON sent to upstream.
c
.
Set
(
OpsUpstreamRequestBodyKey
,
string
(
body
))
}
resp
,
err
=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
resp
,
err
=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
if
err
!=
nil
{
if
err
!=
nil
{
safeErr
:=
sanitizeUpstreamErrorMessage
(
err
.
Error
())
safeErr
:=
sanitizeUpstreamErrorMessage
(
err
.
Error
())
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -588,6 +595,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
...
@@ -588,6 +595,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
upstreamReqID
,
UpstreamRequestID
:
upstreamReqID
,
Kind
:
"signature_error"
,
Kind
:
"signature_error"
,
...
@@ -662,6 +670,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
...
@@ -662,6 +670,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
upstreamReqID
,
UpstreamRequestID
:
upstreamReqID
,
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -711,6 +720,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
...
@@ -711,6 +720,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
upstreamReqID
,
UpstreamRequestID
:
upstreamReqID
,
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -737,6 +747,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
...
@@ -737,6 +747,7 @@ func (s *GeminiMessagesCompatService) Forward(ctx context.Context, c *gin.Contex
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
upstreamReqID
,
UpstreamRequestID
:
upstreamReqID
,
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -972,12 +983,19 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
...
@@ -972,12 +983,19 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
}
}
requestIDHeader
=
idHeader
requestIDHeader
=
idHeader
// Capture upstream request body for ops retry of this attempt.
if
c
!=
nil
{
// In this code path `body` is already the JSON sent to upstream.
c
.
Set
(
OpsUpstreamRequestBodyKey
,
string
(
body
))
}
resp
,
err
=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
resp
,
err
=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
if
err
!=
nil
{
if
err
!=
nil
{
safeErr
:=
sanitizeUpstreamErrorMessage
(
err
.
Error
())
safeErr
:=
sanitizeUpstreamErrorMessage
(
err
.
Error
())
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -1036,6 +1054,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
...
@@ -1036,6 +1054,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
upstreamReqID
,
UpstreamRequestID
:
upstreamReqID
,
Kind
:
"retry"
,
Kind
:
"retry"
,
...
@@ -1120,6 +1139,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
...
@@ -1120,6 +1139,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
requestID
,
UpstreamRequestID
:
requestID
,
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -1143,6 +1163,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
...
@@ -1143,6 +1163,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
requestID
,
UpstreamRequestID
:
requestID
,
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -1168,6 +1189,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
...
@@ -1168,6 +1189,7 @@ func (s *GeminiMessagesCompatService) ForwardNative(ctx context.Context, c *gin.
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
requestID
,
UpstreamRequestID
:
requestID
,
Kind
:
"http_error"
,
Kind
:
"http_error"
,
...
@@ -1300,6 +1322,7 @@ func (s *GeminiMessagesCompatService) writeGeminiMappedError(c *gin.Context, acc
...
@@ -1300,6 +1322,7 @@ func (s *GeminiMessagesCompatService) writeGeminiMappedError(c *gin.Context, acc
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
upstreamStatus
,
UpstreamStatusCode
:
upstreamStatus
,
UpstreamRequestID
:
upstreamRequestID
,
UpstreamRequestID
:
upstreamRequestID
,
Kind
:
"http_error"
,
Kind
:
"http_error"
,
...
...
backend/internal/service/openai_gateway_service.go
View file @
27214f86
...
@@ -664,6 +664,11 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
...
@@ -664,6 +664,11 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
proxyURL
=
account
.
Proxy
.
URL
()
proxyURL
=
account
.
Proxy
.
URL
()
}
}
// Capture upstream request body for ops retry of this attempt.
if
c
!=
nil
{
c
.
Set
(
OpsUpstreamRequestBodyKey
,
string
(
body
))
}
// Send request
// Send request
resp
,
err
:=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
resp
,
err
:=
s
.
httpUpstream
.
Do
(
upstreamReq
,
proxyURL
,
account
.
ID
,
account
.
Concurrency
)
if
err
!=
nil
{
if
err
!=
nil
{
...
@@ -673,6 +678,7 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
...
@@ -673,6 +678,7 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
0
,
UpstreamStatusCode
:
0
,
Kind
:
"request_error"
,
Kind
:
"request_error"
,
Message
:
safeErr
,
Message
:
safeErr
,
...
@@ -707,6 +713,7 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
...
@@ -707,6 +713,7 @@ func (s *OpenAIGatewayService) Forward(ctx context.Context, c *gin.Context, acco
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"failover"
,
Kind
:
"failover"
,
...
@@ -864,6 +871,7 @@ func (s *OpenAIGatewayService) handleErrorResponse(ctx context.Context, resp *ht
...
@@ -864,6 +871,7 @@ func (s *OpenAIGatewayService) handleErrorResponse(ctx context.Context, resp *ht
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
"http_error"
,
Kind
:
"http_error"
,
...
@@ -894,6 +902,7 @@ func (s *OpenAIGatewayService) handleErrorResponse(ctx context.Context, resp *ht
...
@@ -894,6 +902,7 @@ func (s *OpenAIGatewayService) handleErrorResponse(ctx context.Context, resp *ht
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
appendOpsUpstreamError
(
c
,
OpsUpstreamErrorEvent
{
Platform
:
account
.
Platform
,
Platform
:
account
.
Platform
,
AccountID
:
account
.
ID
,
AccountID
:
account
.
ID
,
AccountName
:
account
.
Name
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamStatusCode
:
resp
.
StatusCode
,
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
UpstreamRequestID
:
resp
.
Header
.
Get
(
"x-request-id"
),
Kind
:
kind
,
Kind
:
kind
,
...
...
backend/internal/service/ops_alert_evaluator_service.go
View file @
27214f86
...
@@ -206,7 +206,7 @@ func (s *OpsAlertEvaluatorService) evaluateOnce(interval time.Duration) {
...
@@ -206,7 +206,7 @@ func (s *OpsAlertEvaluatorService) evaluateOnce(interval time.Duration) {
continue
continue
}
}
scopePlatform
,
scopeGroupID
:=
parseOpsAlertRuleScope
(
rule
.
Filters
)
scopePlatform
,
scopeGroupID
,
scopeRegion
:=
parseOpsAlertRuleScope
(
rule
.
Filters
)
windowMinutes
:=
rule
.
WindowMinutes
windowMinutes
:=
rule
.
WindowMinutes
if
windowMinutes
<=
0
{
if
windowMinutes
<=
0
{
...
@@ -236,6 +236,17 @@ func (s *OpsAlertEvaluatorService) evaluateOnce(interval time.Duration) {
...
@@ -236,6 +236,17 @@ func (s *OpsAlertEvaluatorService) evaluateOnce(interval time.Duration) {
continue
continue
}
}
// Scoped silencing: if a matching silence exists, skip creating a firing event.
if
s
.
opsService
!=
nil
{
platform
:=
strings
.
TrimSpace
(
scopePlatform
)
region
:=
scopeRegion
if
platform
!=
""
{
if
ok
,
err
:=
s
.
opsService
.
IsAlertSilenced
(
ctx
,
rule
.
ID
,
platform
,
scopeGroupID
,
region
,
now
);
err
==
nil
&&
ok
{
continue
}
}
}
latestEvent
,
err
:=
s
.
opsRepo
.
GetLatestAlertEvent
(
ctx
,
rule
.
ID
)
latestEvent
,
err
:=
s
.
opsRepo
.
GetLatestAlertEvent
(
ctx
,
rule
.
ID
)
if
err
!=
nil
{
if
err
!=
nil
{
log
.
Printf
(
"[OpsAlertEvaluator] get latest event failed (rule=%d): %v"
,
rule
.
ID
,
err
)
log
.
Printf
(
"[OpsAlertEvaluator] get latest event failed (rule=%d): %v"
,
rule
.
ID
,
err
)
...
@@ -359,9 +370,9 @@ func requiredSustainedBreaches(sustainedMinutes int, interval time.Duration) int
...
@@ -359,9 +370,9 @@ func requiredSustainedBreaches(sustainedMinutes int, interval time.Duration) int
return
required
return
required
}
}
func
parseOpsAlertRuleScope
(
filters
map
[
string
]
any
)
(
platform
string
,
groupID
*
int64
)
{
func
parseOpsAlertRuleScope
(
filters
map
[
string
]
any
)
(
platform
string
,
groupID
*
int64
,
region
*
string
)
{
if
filters
==
nil
{
if
filters
==
nil
{
return
""
,
nil
return
""
,
nil
,
nil
}
}
if
v
,
ok
:=
filters
[
"platform"
];
ok
{
if
v
,
ok
:=
filters
[
"platform"
];
ok
{
if
s
,
ok
:=
v
.
(
string
);
ok
{
if
s
,
ok
:=
v
.
(
string
);
ok
{
...
@@ -392,7 +403,15 @@ func parseOpsAlertRuleScope(filters map[string]any) (platform string, groupID *i
...
@@ -392,7 +403,15 @@ func parseOpsAlertRuleScope(filters map[string]any) (platform string, groupID *i
}
}
}
}
}
}
return
platform
,
groupID
if
v
,
ok
:=
filters
[
"region"
];
ok
{
if
s
,
ok
:=
v
.
(
string
);
ok
{
vv
:=
strings
.
TrimSpace
(
s
)
if
vv
!=
""
{
region
=
&
vv
}
}
}
return
platform
,
groupID
,
region
}
}
func
(
s
*
OpsAlertEvaluatorService
)
computeRuleMetric
(
func
(
s
*
OpsAlertEvaluatorService
)
computeRuleMetric
(
...
@@ -504,16 +523,6 @@ func (s *OpsAlertEvaluatorService) computeRuleMetric(
...
@@ -504,16 +523,6 @@ func (s *OpsAlertEvaluatorService) computeRuleMetric(
return
0
,
false
return
0
,
false
}
}
return
overview
.
UpstreamErrorRate
*
100
,
true
return
overview
.
UpstreamErrorRate
*
100
,
true
case
"p95_latency_ms"
:
if
overview
.
Duration
.
P95
==
nil
{
return
0
,
false
}
return
float64
(
*
overview
.
Duration
.
P95
),
true
case
"p99_latency_ms"
:
if
overview
.
Duration
.
P99
==
nil
{
return
0
,
false
}
return
float64
(
*
overview
.
Duration
.
P99
),
true
default
:
default
:
return
0
,
false
return
0
,
false
}
}
...
...
backend/internal/service/ops_alert_models.go
View file @
27214f86
...
@@ -8,8 +8,9 @@ import "time"
...
@@ -8,8 +8,9 @@ import "time"
// with the existing ops dashboard frontend (backup style).
// with the existing ops dashboard frontend (backup style).
const
(
const
(
OpsAlertStatusFiring
=
"firing"
OpsAlertStatusFiring
=
"firing"
OpsAlertStatusResolved
=
"resolved"
OpsAlertStatusResolved
=
"resolved"
OpsAlertStatusManualResolved
=
"manual_resolved"
)
)
type
OpsAlertRule
struct
{
type
OpsAlertRule
struct
{
...
@@ -58,12 +59,32 @@ type OpsAlertEvent struct {
...
@@ -58,12 +59,32 @@ type OpsAlertEvent struct {
CreatedAt
time
.
Time
`json:"created_at"`
CreatedAt
time
.
Time
`json:"created_at"`
}
}
type
OpsAlertSilence
struct
{
ID
int64
`json:"id"`
RuleID
int64
`json:"rule_id"`
Platform
string
`json:"platform"`
GroupID
*
int64
`json:"group_id,omitempty"`
Region
*
string
`json:"region,omitempty"`
Until
time
.
Time
`json:"until"`
Reason
string
`json:"reason"`
CreatedBy
*
int64
`json:"created_by,omitempty"`
CreatedAt
time
.
Time
`json:"created_at"`
}
type
OpsAlertEventFilter
struct
{
type
OpsAlertEventFilter
struct
{
Limit
int
Limit
int
// Cursor pagination (descending by fired_at, then id).
BeforeFiredAt
*
time
.
Time
BeforeID
*
int64
// Optional filters.
// Optional filters.
Status
string
Status
string
Severity
string
Severity
string
EmailSent
*
bool
StartTime
*
time
.
Time
StartTime
*
time
.
Time
EndTime
*
time
.
Time
EndTime
*
time
.
Time
...
...
backend/internal/service/ops_alerts.go
View file @
27214f86
...
@@ -88,6 +88,29 @@ func (s *OpsService) ListAlertEvents(ctx context.Context, filter *OpsAlertEventF
...
@@ -88,6 +88,29 @@ func (s *OpsService) ListAlertEvents(ctx context.Context, filter *OpsAlertEventF
return
s
.
opsRepo
.
ListAlertEvents
(
ctx
,
filter
)
return
s
.
opsRepo
.
ListAlertEvents
(
ctx
,
filter
)
}
}
func
(
s
*
OpsService
)
GetAlertEventByID
(
ctx
context
.
Context
,
eventID
int64
)
(
*
OpsAlertEvent
,
error
)
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
return
nil
,
err
}
if
s
.
opsRepo
==
nil
{
return
nil
,
infraerrors
.
ServiceUnavailable
(
"OPS_REPO_UNAVAILABLE"
,
"Ops repository not available"
)
}
if
eventID
<=
0
{
return
nil
,
infraerrors
.
BadRequest
(
"INVALID_EVENT_ID"
,
"invalid event id"
)
}
ev
,
err
:=
s
.
opsRepo
.
GetAlertEventByID
(
ctx
,
eventID
)
if
err
!=
nil
{
if
errors
.
Is
(
err
,
sql
.
ErrNoRows
)
{
return
nil
,
infraerrors
.
NotFound
(
"OPS_ALERT_EVENT_NOT_FOUND"
,
"alert event not found"
)
}
return
nil
,
err
}
if
ev
==
nil
{
return
nil
,
infraerrors
.
NotFound
(
"OPS_ALERT_EVENT_NOT_FOUND"
,
"alert event not found"
)
}
return
ev
,
nil
}
func
(
s
*
OpsService
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
{
func
(
s
*
OpsService
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
return
nil
,
err
return
nil
,
err
...
@@ -101,6 +124,49 @@ func (s *OpsService) GetActiveAlertEvent(ctx context.Context, ruleID int64) (*Op
...
@@ -101,6 +124,49 @@ func (s *OpsService) GetActiveAlertEvent(ctx context.Context, ruleID int64) (*Op
return
s
.
opsRepo
.
GetActiveAlertEvent
(
ctx
,
ruleID
)
return
s
.
opsRepo
.
GetActiveAlertEvent
(
ctx
,
ruleID
)
}
}
func
(
s
*
OpsService
)
CreateAlertSilence
(
ctx
context
.
Context
,
input
*
OpsAlertSilence
)
(
*
OpsAlertSilence
,
error
)
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
return
nil
,
err
}
if
s
.
opsRepo
==
nil
{
return
nil
,
infraerrors
.
ServiceUnavailable
(
"OPS_REPO_UNAVAILABLE"
,
"Ops repository not available"
)
}
if
input
==
nil
{
return
nil
,
infraerrors
.
BadRequest
(
"INVALID_SILENCE"
,
"invalid silence"
)
}
if
input
.
RuleID
<=
0
{
return
nil
,
infraerrors
.
BadRequest
(
"INVALID_RULE_ID"
,
"invalid rule id"
)
}
if
strings
.
TrimSpace
(
input
.
Platform
)
==
""
{
return
nil
,
infraerrors
.
BadRequest
(
"INVALID_PLATFORM"
,
"invalid platform"
)
}
if
input
.
Until
.
IsZero
()
{
return
nil
,
infraerrors
.
BadRequest
(
"INVALID_UNTIL"
,
"invalid until"
)
}
created
,
err
:=
s
.
opsRepo
.
CreateAlertSilence
(
ctx
,
input
)
if
err
!=
nil
{
return
nil
,
err
}
return
created
,
nil
}
func
(
s
*
OpsService
)
IsAlertSilenced
(
ctx
context
.
Context
,
ruleID
int64
,
platform
string
,
groupID
*
int64
,
region
*
string
,
now
time
.
Time
)
(
bool
,
error
)
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
return
false
,
err
}
if
s
.
opsRepo
==
nil
{
return
false
,
infraerrors
.
ServiceUnavailable
(
"OPS_REPO_UNAVAILABLE"
,
"Ops repository not available"
)
}
if
ruleID
<=
0
{
return
false
,
infraerrors
.
BadRequest
(
"INVALID_RULE_ID"
,
"invalid rule id"
)
}
if
strings
.
TrimSpace
(
platform
)
==
""
{
return
false
,
nil
}
return
s
.
opsRepo
.
IsAlertSilenced
(
ctx
,
ruleID
,
platform
,
groupID
,
region
,
now
)
}
func
(
s
*
OpsService
)
GetLatestAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
{
func
(
s
*
OpsService
)
GetLatestAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
if
err
:=
s
.
RequireMonitoringEnabled
(
ctx
);
err
!=
nil
{
return
nil
,
err
return
nil
,
err
...
@@ -142,7 +208,11 @@ func (s *OpsService) UpdateAlertEventStatus(ctx context.Context, eventID int64,
...
@@ -142,7 +208,11 @@ func (s *OpsService) UpdateAlertEventStatus(ctx context.Context, eventID int64,
if
eventID
<=
0
{
if
eventID
<=
0
{
return
infraerrors
.
BadRequest
(
"INVALID_EVENT_ID"
,
"invalid event id"
)
return
infraerrors
.
BadRequest
(
"INVALID_EVENT_ID"
,
"invalid event id"
)
}
}
if
strings
.
TrimSpace
(
status
)
==
""
{
status
=
strings
.
TrimSpace
(
status
)
if
status
==
""
{
return
infraerrors
.
BadRequest
(
"INVALID_STATUS"
,
"invalid status"
)
}
if
status
!=
OpsAlertStatusResolved
&&
status
!=
OpsAlertStatusManualResolved
{
return
infraerrors
.
BadRequest
(
"INVALID_STATUS"
,
"invalid status"
)
return
infraerrors
.
BadRequest
(
"INVALID_STATUS"
,
"invalid status"
)
}
}
return
s
.
opsRepo
.
UpdateAlertEventStatus
(
ctx
,
eventID
,
status
,
resolvedAt
)
return
s
.
opsRepo
.
UpdateAlertEventStatus
(
ctx
,
eventID
,
status
,
resolvedAt
)
...
...
backend/internal/service/ops_health_score.go
View file @
27214f86
...
@@ -32,49 +32,38 @@ func computeDashboardHealthScore(now time.Time, overview *OpsDashboardOverview)
...
@@ -32,49 +32,38 @@ func computeDashboardHealthScore(now time.Time, overview *OpsDashboardOverview)
}
}
// computeBusinessHealth calculates business health score (0-100)
// computeBusinessHealth calculates business health score (0-100)
// Components:
SLA (50%) +
Error Rate (
3
0%) +
Latency
(
2
0%)
// Components: Error Rate (
5
0%) +
TTFT
(
5
0%)
func
computeBusinessHealth
(
overview
*
OpsDashboardOverview
)
float64
{
func
computeBusinessHealth
(
overview
*
OpsDashboardOverview
)
float64
{
// SLA score: 99.5% → 100, 95% → 0 (linear)
// Error rate score: 1% → 100, 10% → 0 (linear)
slaScore
:=
100.0
slaPct
:=
clampFloat64
(
overview
.
SLA
*
100
,
0
,
100
)
if
slaPct
<
99.5
{
if
slaPct
>=
95
{
slaScore
=
(
slaPct
-
95
)
/
4.5
*
100
}
else
{
slaScore
=
0
}
}
// Error rate score: 0.5% → 100, 5% → 0 (linear)
// Combines request errors and upstream errors
// Combines request errors and upstream errors
errorScore
:=
100.0
errorScore
:=
100.0
errorPct
:=
clampFloat64
(
overview
.
ErrorRate
*
100
,
0
,
100
)
errorPct
:=
clampFloat64
(
overview
.
ErrorRate
*
100
,
0
,
100
)
upstreamPct
:=
clampFloat64
(
overview
.
UpstreamErrorRate
*
100
,
0
,
100
)
upstreamPct
:=
clampFloat64
(
overview
.
UpstreamErrorRate
*
100
,
0
,
100
)
combinedErrorPct
:=
math
.
Max
(
errorPct
,
upstreamPct
)
// Use worst case
combinedErrorPct
:=
math
.
Max
(
errorPct
,
upstreamPct
)
// Use worst case
if
combinedErrorPct
>
0.5
{
if
combinedErrorPct
>
1.0
{
if
combinedErrorPct
<=
5
{
if
combinedErrorPct
<=
10.0
{
errorScore
=
(
5
-
combinedErrorPct
)
/
4.5
*
100
errorScore
=
(
10.0
-
combinedErrorPct
)
/
9.0
*
100
}
else
{
}
else
{
errorScore
=
0
errorScore
=
0
}
}
}
}
//
Latency
score: 1s → 100,
10
s → 0 (linear)
//
TTFT
score: 1s → 100,
3
s → 0 (linear)
//
Uses P99 of duration (TTFT is less critical for overall health)
//
Time to first token is critical for user experience
latency
Score
:=
100.0
ttft
Score
:=
100.0
if
overview
.
Duration
.
P99
!=
nil
{
if
overview
.
TTFT
.
P99
!=
nil
{
p99
:=
float64
(
*
overview
.
Duration
.
P99
)
p99
:=
float64
(
*
overview
.
TTFT
.
P99
)
if
p99
>
1000
{
if
p99
>
1000
{
if
p99
<=
10
000
{
if
p99
<=
3
000
{
latency
Score
=
(
10
000
-
p99
)
/
9
000
*
100
ttft
Score
=
(
3
000
-
p99
)
/
2
000
*
100
}
else
{
}
else
{
latency
Score
=
0
ttft
Score
=
0
}
}
}
}
}
}
// Weighted combination
// Weighted combination
: 50% error rate + 50% TTFT
return
slaScore
*
0.5
+
errorScore
*
0.
3
+
latency
Score
*
0.
2
return
errorScore
*
0.
5
+
ttft
Score
*
0.
5
}
}
// computeInfraHealth calculates infrastructure health score (0-100)
// computeInfraHealth calculates infrastructure health score (0-100)
...
...
backend/internal/service/ops_health_score_test.go
View file @
27214f86
...
@@ -127,8 +127,8 @@ func TestComputeDashboardHealthScore_Comprehensive(t *testing.T) {
...
@@ -127,8 +127,8 @@ func TestComputeDashboardHealthScore_Comprehensive(t *testing.T) {
MemoryUsagePercent
:
float64Ptr
(
75
),
MemoryUsagePercent
:
float64Ptr
(
75
),
},
},
},
},
wantMin
:
6
0
,
wantMin
:
9
6
,
wantMax
:
85
,
wantMax
:
97
,
},
},
{
{
name
:
"DB failure"
,
name
:
"DB failure"
,
...
@@ -203,8 +203,8 @@ func TestComputeDashboardHealthScore_Comprehensive(t *testing.T) {
...
@@ -203,8 +203,8 @@ func TestComputeDashboardHealthScore_Comprehensive(t *testing.T) {
MemoryUsagePercent
:
float64Ptr
(
30
),
MemoryUsagePercent
:
float64Ptr
(
30
),
},
},
},
},
wantMin
:
25
,
wantMin
:
84
,
wantMax
:
5
0
,
wantMax
:
8
5
,
},
},
{
{
name
:
"combined failures - business healthy + infra degraded"
,
name
:
"combined failures - business healthy + infra degraded"
,
...
@@ -277,30 +277,41 @@ func TestComputeBusinessHealth(t *testing.T) {
...
@@ -277,30 +277,41 @@ func TestComputeBusinessHealth(t *testing.T) {
UpstreamErrorRate
:
0
,
UpstreamErrorRate
:
0
,
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
},
},
wantMin
:
5
0
,
wantMin
:
10
0
,
wantMax
:
6
0
,
wantMax
:
10
0
,
},
},
{
{
name
:
"error rate boundary
0.5
%"
,
name
:
"error rate boundary
1
%"
,
overview
:
&
OpsDashboardOverview
{
overview
:
&
OpsDashboardOverview
{
SLA
:
0.99
5
,
SLA
:
0.99
,
ErrorRate
:
0.0
05
,
ErrorRate
:
0.0
1
,
UpstreamErrorRate
:
0
,
UpstreamErrorRate
:
0
,
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
},
},
wantMin
:
95
,
wantMin
:
100
,
wantMax
:
100
,
wantMax
:
100
,
},
},
{
{
name
:
"
latency boundary 1000ms
"
,
name
:
"
error rate 5%
"
,
overview
:
&
OpsDashboardOverview
{
overview
:
&
OpsDashboardOverview
{
SLA
:
0.995
,
SLA
:
0.95
,
ErrorRate
:
0.05
,
UpstreamErrorRate
:
0
,
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
},
wantMin
:
77
,
wantMax
:
78
,
},
{
name
:
"TTFT boundary 2s"
,
overview
:
&
OpsDashboardOverview
{
SLA
:
0.99
,
ErrorRate
:
0
,
ErrorRate
:
0
,
UpstreamErrorRate
:
0
,
UpstreamErrorRate
:
0
,
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
1
000
)},
TTFT
:
OpsPercentiles
{
P99
:
intPtr
(
2
000
)},
},
},
wantMin
:
9
5
,
wantMin
:
7
5
,
wantMax
:
100
,
wantMax
:
75
,
},
},
{
{
name
:
"upstream error dominates"
,
name
:
"upstream error dominates"
,
...
@@ -310,7 +321,7 @@ func TestComputeBusinessHealth(t *testing.T) {
...
@@ -310,7 +321,7 @@ func TestComputeBusinessHealth(t *testing.T) {
UpstreamErrorRate
:
0.03
,
UpstreamErrorRate
:
0.03
,
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
Duration
:
OpsPercentiles
{
P99
:
intPtr
(
500
)},
},
},
wantMin
:
75
,
wantMin
:
88
,
wantMax
:
90
,
wantMax
:
90
,
},
},
}
}
...
...
backend/internal/service/ops_models.go
View file @
27214f86
...
@@ -6,24 +6,43 @@ type OpsErrorLog struct {
...
@@ -6,24 +6,43 @@ type OpsErrorLog struct {
ID
int64
`json:"id"`
ID
int64
`json:"id"`
CreatedAt
time
.
Time
`json:"created_at"`
CreatedAt
time
.
Time
`json:"created_at"`
Phase
string
`json:"phase"`
// Standardized classification
Type
string
`json:"type"`
// - phase: request|auth|routing|upstream|network|internal
// - owner: client|provider|platform
// - source: client_request|upstream_http|gateway
Phase
string
`json:"phase"`
Type
string
`json:"type"`
Owner
string
`json:"error_owner"`
Source
string
`json:"error_source"`
Severity
string
`json:"severity"`
Severity
string
`json:"severity"`
StatusCode
int
`json:"status_code"`
StatusCode
int
`json:"status_code"`
Platform
string
`json:"platform"`
Platform
string
`json:"platform"`
Model
string
`json:"model"`
Model
string
`json:"model"`
LatencyMs
*
int
`json:"latency_ms"`
IsRetryable
bool
`json:"is_retryable"`
RetryCount
int
`json:"retry_count"`
Resolved
bool
`json:"resolved"`
ResolvedAt
*
time
.
Time
`json:"resolved_at"`
ResolvedByUserID
*
int64
`json:"resolved_by_user_id"`
ResolvedByUserName
string
`json:"resolved_by_user_name"`
ResolvedRetryID
*
int64
`json:"resolved_retry_id"`
ResolvedStatusRaw
string
`json:"-"`
ClientRequestID
string
`json:"client_request_id"`
ClientRequestID
string
`json:"client_request_id"`
RequestID
string
`json:"request_id"`
RequestID
string
`json:"request_id"`
Message
string
`json:"message"`
Message
string
`json:"message"`
UserID
*
int64
`json:"user_id"`
UserID
*
int64
`json:"user_id"`
APIKeyID
*
int64
`json:"api_key_id"`
UserEmail
string
`json:"user_email"`
AccountID
*
int64
`json:"account_id"`
APIKeyID
*
int64
`json:"api_key_id"`
GroupID
*
int64
`json:"group_id"`
AccountID
*
int64
`json:"account_id"`
AccountName
string
`json:"account_name"`
GroupID
*
int64
`json:"group_id"`
GroupName
string
`json:"group_name"`
ClientIP
*
string
`json:"client_ip"`
ClientIP
*
string
`json:"client_ip"`
RequestPath
string
`json:"request_path"`
RequestPath
string
`json:"request_path"`
...
@@ -67,9 +86,24 @@ type OpsErrorLogFilter struct {
...
@@ -67,9 +86,24 @@ type OpsErrorLogFilter struct {
GroupID
*
int64
GroupID
*
int64
AccountID
*
int64
AccountID
*
int64
StatusCodes
[]
int
StatusCodes
[]
int
Phase
string
StatusCodesOther
bool
Query
string
Phase
string
Owner
string
Source
string
Resolved
*
bool
Query
string
UserQuery
string
// Search by user email
// Optional correlation keys for exact matching.
RequestID
string
ClientRequestID
string
// View controls error categorization for list endpoints.
// - errors: show actionable errors (exclude business-limited / 429 / 529)
// - excluded: only show excluded errors
// - all: show everything
View
string
Page
int
Page
int
PageSize
int
PageSize
int
...
@@ -90,12 +124,23 @@ type OpsRetryAttempt struct {
...
@@ -90,12 +124,23 @@ type OpsRetryAttempt struct {
SourceErrorID
int64
`json:"source_error_id"`
SourceErrorID
int64
`json:"source_error_id"`
Mode
string
`json:"mode"`
Mode
string
`json:"mode"`
PinnedAccountID
*
int64
`json:"pinned_account_id"`
PinnedAccountID
*
int64
`json:"pinned_account_id"`
PinnedAccountName
string
`json:"pinned_account_name"`
Status
string
`json:"status"`
Status
string
`json:"status"`
StartedAt
*
time
.
Time
`json:"started_at"`
StartedAt
*
time
.
Time
`json:"started_at"`
FinishedAt
*
time
.
Time
`json:"finished_at"`
FinishedAt
*
time
.
Time
`json:"finished_at"`
DurationMs
*
int64
`json:"duration_ms"`
DurationMs
*
int64
`json:"duration_ms"`
// Persisted execution results (best-effort)
Success
*
bool
`json:"success"`
HTTPStatusCode
*
int
`json:"http_status_code"`
UpstreamRequestID
*
string
`json:"upstream_request_id"`
UsedAccountID
*
int64
`json:"used_account_id"`
UsedAccountName
string
`json:"used_account_name"`
ResponsePreview
*
string
`json:"response_preview"`
ResponseTruncated
*
bool
`json:"response_truncated"`
// Optional correlation
ResultRequestID
*
string
`json:"result_request_id"`
ResultRequestID
*
string
`json:"result_request_id"`
ResultErrorID
*
int64
`json:"result_error_id"`
ResultErrorID
*
int64
`json:"result_error_id"`
...
...
backend/internal/service/ops_port.go
View file @
27214f86
...
@@ -14,6 +14,8 @@ type OpsRepository interface {
...
@@ -14,6 +14,8 @@ type OpsRepository interface {
InsertRetryAttempt
(
ctx
context
.
Context
,
input
*
OpsInsertRetryAttemptInput
)
(
int64
,
error
)
InsertRetryAttempt
(
ctx
context
.
Context
,
input
*
OpsInsertRetryAttemptInput
)
(
int64
,
error
)
UpdateRetryAttempt
(
ctx
context
.
Context
,
input
*
OpsUpdateRetryAttemptInput
)
error
UpdateRetryAttempt
(
ctx
context
.
Context
,
input
*
OpsUpdateRetryAttemptInput
)
error
GetLatestRetryAttemptForError
(
ctx
context
.
Context
,
sourceErrorID
int64
)
(
*
OpsRetryAttempt
,
error
)
GetLatestRetryAttemptForError
(
ctx
context
.
Context
,
sourceErrorID
int64
)
(
*
OpsRetryAttempt
,
error
)
ListRetryAttemptsByErrorID
(
ctx
context
.
Context
,
sourceErrorID
int64
,
limit
int
)
([]
*
OpsRetryAttempt
,
error
)
UpdateErrorResolution
(
ctx
context
.
Context
,
errorID
int64
,
resolved
bool
,
resolvedByUserID
*
int64
,
resolvedRetryID
*
int64
,
resolvedAt
*
time
.
Time
)
error
// Lightweight window stats (for realtime WS / quick sampling).
// Lightweight window stats (for realtime WS / quick sampling).
GetWindowStats
(
ctx
context
.
Context
,
filter
*
OpsDashboardFilter
)
(
*
OpsWindowStats
,
error
)
GetWindowStats
(
ctx
context
.
Context
,
filter
*
OpsDashboardFilter
)
(
*
OpsWindowStats
,
error
)
...
@@ -39,12 +41,17 @@ type OpsRepository interface {
...
@@ -39,12 +41,17 @@ type OpsRepository interface {
DeleteAlertRule
(
ctx
context
.
Context
,
id
int64
)
error
DeleteAlertRule
(
ctx
context
.
Context
,
id
int64
)
error
ListAlertEvents
(
ctx
context
.
Context
,
filter
*
OpsAlertEventFilter
)
([]
*
OpsAlertEvent
,
error
)
ListAlertEvents
(
ctx
context
.
Context
,
filter
*
OpsAlertEventFilter
)
([]
*
OpsAlertEvent
,
error
)
GetAlertEventByID
(
ctx
context
.
Context
,
eventID
int64
)
(
*
OpsAlertEvent
,
error
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
GetActiveAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
GetLatestAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
GetLatestAlertEvent
(
ctx
context
.
Context
,
ruleID
int64
)
(
*
OpsAlertEvent
,
error
)
CreateAlertEvent
(
ctx
context
.
Context
,
event
*
OpsAlertEvent
)
(
*
OpsAlertEvent
,
error
)
CreateAlertEvent
(
ctx
context
.
Context
,
event
*
OpsAlertEvent
)
(
*
OpsAlertEvent
,
error
)
UpdateAlertEventStatus
(
ctx
context
.
Context
,
eventID
int64
,
status
string
,
resolvedAt
*
time
.
Time
)
error
UpdateAlertEventStatus
(
ctx
context
.
Context
,
eventID
int64
,
status
string
,
resolvedAt
*
time
.
Time
)
error
UpdateAlertEventEmailSent
(
ctx
context
.
Context
,
eventID
int64
,
emailSent
bool
)
error
UpdateAlertEventEmailSent
(
ctx
context
.
Context
,
eventID
int64
,
emailSent
bool
)
error
// Alert silences
CreateAlertSilence
(
ctx
context
.
Context
,
input
*
OpsAlertSilence
)
(
*
OpsAlertSilence
,
error
)
IsAlertSilenced
(
ctx
context
.
Context
,
ruleID
int64
,
platform
string
,
groupID
*
int64
,
region
*
string
,
now
time
.
Time
)
(
bool
,
error
)
// Pre-aggregation (hourly/daily) used for long-window dashboard performance.
// Pre-aggregation (hourly/daily) used for long-window dashboard performance.
UpsertHourlyMetrics
(
ctx
context
.
Context
,
startTime
,
endTime
time
.
Time
)
error
UpsertHourlyMetrics
(
ctx
context
.
Context
,
startTime
,
endTime
time
.
Time
)
error
UpsertDailyMetrics
(
ctx
context
.
Context
,
startTime
,
endTime
time
.
Time
)
error
UpsertDailyMetrics
(
ctx
context
.
Context
,
startTime
,
endTime
time
.
Time
)
error
...
@@ -91,7 +98,6 @@ type OpsInsertErrorLogInput struct {
...
@@ -91,7 +98,6 @@ type OpsInsertErrorLogInput struct {
// It is set by OpsService.RecordError before persisting.
// It is set by OpsService.RecordError before persisting.
UpstreamErrorsJSON
*
string
UpstreamErrorsJSON
*
string
DurationMs
*
int
TimeToFirstTokenMs
*
int64
TimeToFirstTokenMs
*
int64
RequestBodyJSON
*
string
// sanitized json string (not raw bytes)
RequestBodyJSON
*
string
// sanitized json string (not raw bytes)
...
@@ -124,7 +130,15 @@ type OpsUpdateRetryAttemptInput struct {
...
@@ -124,7 +130,15 @@ type OpsUpdateRetryAttemptInput struct {
FinishedAt
time
.
Time
FinishedAt
time
.
Time
DurationMs
int64
DurationMs
int64
// Optional correlation
// Persisted execution results (best-effort)
Success
*
bool
HTTPStatusCode
*
int
UpstreamRequestID
*
string
UsedAccountID
*
int64
ResponsePreview
*
string
ResponseTruncated
*
bool
// Optional correlation (legacy fields kept)
ResultRequestID
*
string
ResultRequestID
*
string
ResultErrorID
*
int64
ResultErrorID
*
int64
...
...
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment