补充多模态时间戳对齐流程图 (Gemini 建议)
新增 2.5 章节: - Mermaid 时序图展示 ASR/OCR/CV 并行处理与对齐流程 - 说明对齐算法要点:时间轴归一化、模糊匹配窗口、事件合并 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
e8e4edfd66
commit
62c87a234c
@ -25,6 +25,7 @@
|
|||||||
| V1.0 | 2026-02-03 | Gemini | 初稿:技术架构、选型、排期 |
|
| V1.0 | 2026-02-03 | Gemini | 初稿:技术架构、选型、排期 |
|
||||||
| V1.1 | 2026-02-03 | Claude | 审阅修订:补充 F-05-A/F-45 技术方案、验收标准、数据模型、测试策略 |
|
| V1.1 | 2026-02-03 | Claude | 审阅修订:补充 F-05-A/F-45 技术方案、验收标准、数据模型、测试策略 |
|
||||||
| V1.2 | 2026-02-03 | Claude | Reviewer 修正:Logo检测改向量检索、Brief解析增VLM、弹性GPU、H5防锁屏、排期调整 |
|
| V1.2 | 2026-02-03 | Claude | Reviewer 修正:Logo检测改向量检索、Brief解析增VLM、弹性GPU、H5防锁屏、排期调整 |
|
||||||
|
| V1.2.1 | 2026-02-03 | Claude | 补充多模态时间戳对齐流程图 (Gemini 建议) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -190,6 +191,42 @@ graph TD
|
|||||||
- 时长统计误差 ≤ 0.5秒
|
- 时长统计误差 ≤ 0.5秒
|
||||||
- 频次统计准确率 ≥ 95%
|
- 频次统计准确率 ≥ 95%
|
||||||
|
|
||||||
|
### 2.5 多模态时间戳对齐流程 ⭐ V1.2 补充
|
||||||
|
|
||||||
|
> 这是 Phase 2 延长 1 周的核心原因:ASR/OCR/CV 的时间轴需要精确同步。
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant Video as 原始视频
|
||||||
|
participant ASR as ASR引擎
|
||||||
|
participant OCR as OCR引擎
|
||||||
|
participant CV as CV检测
|
||||||
|
participant Alignment as 对齐算法
|
||||||
|
participant Rule as 规则引擎
|
||||||
|
|
||||||
|
par 并行处理
|
||||||
|
Video->>ASR: 提取音频
|
||||||
|
ASR-->>Alignment: 输出: [{text: "品牌", start: 5.2s, end: 5.8s}, ...]
|
||||||
|
|
||||||
|
Video->>OCR: 提取关键帧
|
||||||
|
OCR-->>Alignment: 输出: [{text: "品牌", timestamp: 5.5s}, ...]
|
||||||
|
|
||||||
|
Video->>CV: 逐帧扫描
|
||||||
|
CV-->>Alignment: 输出: [{object: "Product", timestamp: 5.0s}, ...]
|
||||||
|
end
|
||||||
|
|
||||||
|
Alignment->>Alignment: 时间轴归一化 & 模糊匹配
|
||||||
|
Alignment-->>Rule: 输出结构化时间轴数据
|
||||||
|
|
||||||
|
Rule->>Rule: 执行逻辑: if (Logo_Duration > 5s) && (Mention_Count >= 3)
|
||||||
|
Rule-->>Video: 输出最终审核结论
|
||||||
|
```
|
||||||
|
|
||||||
|
**对齐算法要点:**
|
||||||
|
1. **时间轴归一化:** 将 ASR (毫秒级) / OCR (帧级) / CV (帧级) 统一为秒级时间戳
|
||||||
|
2. **模糊匹配窗口:** 允许 ±0.5s 的时间容差,解决各模态时间戳微小偏差
|
||||||
|
3. **事件合并:** 将同一时间窗口内的多模态事件合并为"复合事件"
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. MVP (P0) 开发范围定义
|
## 3. MVP (P0) 开发范围定义
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user