1. 29 Apr, 2026 1 commit
    • erio's avatar
      feat(ops): allow retention days = 0 to wipe table on each scheduled cleanup · 4b6954f9
      erio authored
      Background / 背景
      
      The ops cleanup task currently rejects retention days < 1 in both validate
      and normalize, so operators who want minimal-history setups (e.g. high
      churn deployments that prefer near-realtime cleanup) cannot express that
      intent through the UI. The only options are 1+ days, which keeps at least
      24h of history regardless of cron frequency.
      
      ops 清理任务目前在 validate 和 normalize 两处都拒绝小于 1 的保留天数,
      让希望尽量不留历史的运维场景(高吞吐部署 + 想用近实时清理)无法通过 UI
      表达。最低只能配 1,等于不管 cron 多频繁,至少都会保留 24 小时的历史。
      
      Purpose / 目的
      
      Let admins set retention days to 0, meaning "every scheduled cleanup
      run wipes the corresponding table(s) entirely". Combined with a more
      frequent cron (e.g. `0 * * * *`) this yields effectively rolling cleanup.
      
      允许管理员把保留天数设为 0,语义为"每次定时清理时把对应表全部清空"。
      搭配更频繁的 cron(比如每小时整点)即可获得近似滚动清理的效果。
      
      Changes / 改动内容
      
      Backend
      
      - service/ops_settings.go: validate accepts [0, 365]; normalize only
        refills default 30 when value is < 0 (negative is treated as legacy
        bad data, 0 is honoured)
      - service/ops_cleanup_service.go: introduce `opsCleanupPlan(now, days)`
        returning `(cutoff, truncate, ok)`. days==0 returns truncate=true and
        short-circuits to a new `truncateOpsTable` helper that uses
        `TRUNCATE TABLE` (O(1), no WAL, no VACUUM pressure). days>0 keeps
        the existing batched DELETE path unchanged. Empty tables skip
        TRUNCATE to avoid the ACCESS EXCLUSIVE lock entirely
      - Extract `isMissingRelationError` helper to dedupe the "table not
        yet created" tolerance shared by both delete and truncate paths
      - Add unit tests for `opsCleanupPlan` (three branches) and
        `isMissingRelationError`
      
      后端
      
      - service/ops_settings.go: validate 接受 [0, 365];normalize 仅在 < 0
        时回填默认 30(负数视为脏数据,0 被尊重)
      - service/ops_cleanup_service.go: 抽 `opsCleanupPlan(now, days)` 返回
        `(cutoff, truncate, ok)`。days==0 → truncate=true,走新增
        `truncateOpsTable`(TRUNCATE TABLE,O(1),无 WAL、无 VACUUM 压力);
        days>0 仍走原批量 DELETE 路径,行为完全不变。空表跳过 TRUNCATE,
        避免无意义的 ACCESS EXCLUSIVE 锁
      - 抽 `isMissingRelationError` helper 复用 delete / truncate 两处的
        "表不存在"宽容判断
      - 补 `opsCleanupPlan` 三分支 + `isMissingRelationError` 单元测试
      
      Frontend
      
      - OpsSettingsDialog.vue: validation accepts [0, 365]; input min=0
      - i18n (zh/en): hint mentions "0 = wipe all on every cleanup",
        validation message updated to 0-365 range
      
      前端
      
      - OpsSettingsDialog.vue: 校验放宽到 [0, 365],input min 改 0
      - i18n(zh/en):hint 补"0 = 每次清理时清空所有",错误提示改 0-365
      
      Trade-offs / 取舍
      
      - TRUNCATE requires ACCESS EXCLUSIVE lock briefly, but ops tables only
        have the cleanup task as a writer, so the lock is invisible to other
        workloads
      - Empty-table guard avoids the lock when there is nothing to clean
      - Negative values are still treated as legacy bad data and replaced
        with default 30 to preserve compatibility
      4b6954f9
  2. 15 Mar, 2026 1 commit
    • erio's avatar
      feat(ops): add ignore insufficient balance errors toggle and extract error constants · cfe72159
      erio authored
      - Add 5th error filter switch IgnoreInsufficientBalanceErrors to suppress
        upstream insufficient balance / insufficient_quota errors from ops log
      - Extract hardcoded error strings into package-level constants for
        shouldSkipOpsErrorLog, normalizeOpsErrorType, classifyOpsPhase, and
        classifyOpsIsBusinessLimited
      - Define ErrNoAvailableAccounts sentinel error and replace all
        errors.New("no available accounts") call sites
      - Update tests to use require.ErrorIs with the sentinel error
      cfe72159
  3. 13 Mar, 2026 1 commit
  4. 12 Mar, 2026 1 commit
  5. 03 Mar, 2026 2 commits
  6. 15 Jan, 2026 1 commit
  7. 14 Jan, 2026 2 commits
    • IanShaw027's avatar
      refactor(ops): 重构ops核心服务层代码 · 967e2587
      IanShaw027 authored
      967e2587
    • IanShaw027's avatar
      refactor(ops): 移除duration相关告警指标,简化监控配置 · 18268381
      IanShaw027 authored
      主要改动:
      - 移除 p95_latency_ms 和 p99_latency_ms 告警指标类型
      - 移除配置中的 latency_p99_ms_max 阈值设置
      - 简化健康分数计算(移除latency权重,重新归一化SLA和错误率)
      - 移除duration相关的诊断规则和阈值检查
      - 统一术语:延迟 → 请求时长
      - 保留duration数据展示,但不再用于告警判断
      - 聚焦TTFT作为主要的响应速度告警指标
      
      影响范围:
      - Backend: handler, service, models, tests
      - Frontend: API types, i18n, components
      18268381
  8. 12 Jan, 2026 4 commits
    • IanShaw027's avatar
      2d45e61a
    • IanShaw027's avatar
      feat(ops): 添加 count_tokens 错误过滤功能 · 345a965f
      IanShaw027 authored
      功能特性:
      - 自动识别并标记 count_tokens 请求的错误
      - 支持配置是否在统计中忽略 count_tokens 错误
      - 错误数据完整保留,仅在统计时动态过滤
      
      技术实现:
      - ops_error_logger.go: 自动标记 count_tokens 请求
      - ops_repo.go: INSERT 语句添加 is_count_tokens 字段
      - ops_repo_dashboard.go: buildErrorWhere 核心过滤函数
      - ops_repo_preagg.go: 预聚合统计中添加过滤
      - ops_repo_trends.go: 趋势统计查询添加过滤(2 处)
      - ops_settings_models.go: 添加 ignore_count_tokens_errors 配置
      - ops_settings.go: 配置验证和默认值设置
      - ops_port.go: 错误日志模型添加 IsCountTokens 字段
      
      业务价值:
      - count_tokens 是探测性请求,其错误不影响真实业务 SLA
      - 用户可根据需求灵活控制是否计入统计
      - 提升错误率、告警等运维指标的准确性
      
      影响范围:
      - Dashboard 概览统计
      - 错误趋势图表
      - 告警规则评估
      - 预聚合指标(hourly/daily)
      - 健康分数计算
      345a965f
    • IanShaw027's avatar
      fix(ops): 修复Go代码格式问题 · e0cccf6e
      IanShaw027 authored
      e0cccf6e
    • IanShaw027's avatar
      feat(ops): 后端添加指标阈值管理API · 7536dbfe
      IanShaw027 authored
      - 新增GetMetricThresholds和UpdateMetricThresholds接口
      - 支持配置SLA、延迟P99、TTFT P99、请求错误率、上游错误率阈值
      - 添加参数验证逻辑
      - 提供默认阈值配置
      7536dbfe
  9. 11 Jan, 2026 2 commits
    • IanShaw027's avatar
      fix(ci): 修复最后一批CI错误 · c48795a9
      IanShaw027 authored
      - 修复 ops_repo_trends.go 中剩余3处 Rows.Close 未检查错误
      - 修复 ops_settings.go, ops_settings_models.go, ops_trends.go 的格式化问题
      c48795a9
    • IanShaw027's avatar
      feat(ops): 添加高级设置API支持 · 988b4d02
      IanShaw027 authored
      - 新增OpsAdvancedSettings数据模型
      - 支持数据保留策略配置(错误日志、分钟级指标、小时级指标)
      - 支持数据聚合开关配置
      - 添加GET/PUT /admin/ops/advanced-settings接口
      - 添加配置校验和默认值处理
      
      相关文件:
      - backend/internal/service/ops_settings_models.go
      - backend/internal/service/ops_settings.go
      - backend/internal/handler/admin/ops_settings_handler.go
      - backend/internal/server/routes/admin.go
      - backend/internal/service/domain_constants.go
      988b4d02
  10. 09 Jan, 2026 1 commit
    • IanShaw027's avatar
      feat(service): 实现运维监控业务逻辑层 · 5baa8b56
      IanShaw027 authored
      - 新增 ops 主服务(ops_service.go)和端口定义(ops_port.go)
      - 实现账号可用性检查服务(ops_account_availability.go)
      - 实现数据聚合服务(ops_aggregation_service.go)
      - 实现告警评估服务(ops_alert_evaluator_service.go)
      - 实现告警管理服务(ops_alerts.go)
      - 实现数据清理服务(ops_cleanup_service.go)
      - 实现并发控制服务(ops_concurrency.go)
      - 实现仪表板服务(ops_dashboard.go)
      - 实现错误处理服务(ops_errors.go)
      - 实现直方图服务(ops_histograms.go)
      - 实现指标采集服务(ops_metrics_collector.go)
      - 实现查询模式服务(ops_query_mode.go)
      - 实现实时监控服务(ops_realtime.go)
      - 实现请求详情服务(ops_request_details.go)
      - 实现重试机制服务(ops_retry.go)
      - 实现配置管理服务(ops_settings.go)
      - 实现趋势分析服务(ops_trends.go)
      - 实现窗口统计服务(ops_window_stats.go)
      - 添加 ops 相关领域常量
      - 注册 service 依赖注入
      5baa8b56