From 62c87a234ce1ead97dd7782224df74b54823f498 Mon Sep 17 00:00:00 2001 From: Your Name Date: Mon, 2 Feb 2026 12:11:18 +0800 Subject: [PATCH] =?UTF-8?q?=E8=A1=A5=E5=85=85=E5=A4=9A=E6=A8=A1=E6=80=81?= =?UTF-8?q?=E6=97=B6=E9=97=B4=E6=88=B3=E5=AF=B9=E9=BD=90=E6=B5=81=E7=A8=8B?= =?UTF-8?q?=E5=9B=BE=20(Gemini=20=E5=BB=BA=E8=AE=AE)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 新增 2.5 章节: - Mermaid 时序图展示 ASR/OCR/CV 并行处理与对齐流程 - 说明对齐算法要点:时间轴归一化、模糊匹配窗口、事件合并 Co-Authored-By: Claude Opus 4.5 --- DevelopmentPlan.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/DevelopmentPlan.md b/DevelopmentPlan.md index 841bedf..cd09cdc 100644 --- a/DevelopmentPlan.md +++ b/DevelopmentPlan.md @@ -25,6 +25,7 @@ | V1.0 | 2026-02-03 | Gemini | 初稿:技术架构、选型、排期 | | V1.1 | 2026-02-03 | Claude | 审阅修订:补充 F-05-A/F-45 技术方案、验收标准、数据模型、测试策略 | | V1.2 | 2026-02-03 | Claude | Reviewer 修正:Logo检测改向量检索、Brief解析增VLM、弹性GPU、H5防锁屏、排期调整 | +| V1.2.1 | 2026-02-03 | Claude | 补充多模态时间戳对齐流程图 (Gemini 建议) | --- @@ -190,6 +191,42 @@ graph TD - 时长统计误差 ≤ 0.5秒 - 频次统计准确率 ≥ 95% +### 2.5 多模态时间戳对齐流程 ⭐ V1.2 补充 + +> 这是 Phase 2 延长 1 周的核心原因:ASR/OCR/CV 的时间轴需要精确同步。 + +```mermaid +sequenceDiagram + participant Video as 原始视频 + participant ASR as ASR引擎 + participant OCR as OCR引擎 + participant CV as CV检测 + participant Alignment as 对齐算法 + participant Rule as 规则引擎 + + par 并行处理 + Video->>ASR: 提取音频 + ASR-->>Alignment: 输出: [{text: "品牌", start: 5.2s, end: 5.8s}, ...] + + Video->>OCR: 提取关键帧 + OCR-->>Alignment: 输出: [{text: "品牌", timestamp: 5.5s}, ...] + + Video->>CV: 逐帧扫描 + CV-->>Alignment: 输出: [{object: "Product", timestamp: 5.0s}, ...] + end + + Alignment->>Alignment: 时间轴归一化 & 模糊匹配 + Alignment-->>Rule: 输出结构化时间轴数据 + + Rule->>Rule: 执行逻辑: if (Logo_Duration > 5s) && (Mention_Count >= 3) + Rule-->>Video: 输出最终审核结论 +``` + +**对齐算法要点:** +1. **时间轴归一化:** 将 ASR (毫秒级) / OCR (帧级) / CV (帧级) 统一为秒级时间戳 +2. **模糊匹配窗口:** 允许 ±0.5s 的时间容差,解决各模态时间戳微小偏差 +3. **事件合并:** 将同一时间窗口内的多模态事件合并为"复合事件" + --- ## 3. MVP (P0) 开发范围定义