douyin-crawler-poc/docs/superpowers/plans/2026-04-17-douyin-zero-arg-target-detection.md

155 lines
5.2 KiB
Markdown

# Douyin Zero-Argument Target Detection Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make `Douyin.py` work with zero arguments by default, auto-detect the current browser page target, and keep a single manual fallback input for creator URLs, video URLs, or `aweme_id`.
**Architecture:** Add a target-resolution layer ahead of the existing crawl logic. Route resolved targets into either a visible-only creator flow or a single-video flow, keeping browser-attach checks and download primitives reusable.
**Tech Stack:** Python 3, `argparse`, `re`, `socket`, `pathlib`, `unittest`
---
### Task 1: Revise the requirements and freeze the contract
**Files:**
- Modify: `externaldocs/2026-04-17-douyin-targeted-crawling-requirements.md`
- Create: `docs/superpowers/specs/2026-04-17-douyin-zero-arg-target-detection-design.md`
- [ ] **Step 1: Align the requirements doc with the approved UX**
Document that `./.venv/bin/python Douyin.py` is the primary command and that manual input is fallback-only.
- [ ] **Step 2: Save the approved design as a spec**
Write the validated design into `docs/superpowers/specs/2026-04-17-douyin-zero-arg-target-detection-design.md`.
- [ ] **Step 3: Review both docs locally**
Read both files and confirm the language matches the agreed zero-argument flow and visible-only scope.
### Task 2: Add failing tests for target parsing and target resolution
**Files:**
- Modify: `test_douyin.py`
- Modify: `Douyin.py`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- `is_creator_url()` accepts supported creator URLs
- `is_video_url()` accepts supported video URLs
- `is_aweme_id()` accepts numeric IDs
- `parse_target_input()` classifies creator URLs, video URLs, and `aweme_id`
- `resolve_target()` uses the active browser page when CLI input is absent
- `resolve_target()` raises a readable error when neither the current page nor the manual input is supported
- [ ] **Step 2: Run the focused tests to verify RED**
Run: `python3 -m unittest test_douyin.py -q`
Expected: FAIL because the new target-resolution helpers do not exist yet.
- [ ] **Step 3: Write the minimal implementation**
Implement the smallest set of pure helper functions and a compact parsed-target structure in `Douyin.py`.
- [ ] **Step 4: Run the focused tests to verify GREEN**
Run: `python3 -m unittest test_douyin.py -q`
Expected: PASS
### Task 3: Add failing tests for current-page behavior and visible-only creator flow
**Files:**
- Modify: `test_douyin.py`
- Modify: `Douyin.py`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- current-page creator mode does not auto-scroll by default
- creator flow reports a clear error when no aweme items are available
- [ ] **Step 2: Run the focused tests to verify RED**
Run: `python3 -m unittest test_douyin.py -q`
Expected: FAIL because the current creator flow still scrolls automatically.
- [ ] **Step 3: Write the minimal implementation**
Split creator crawling so the default path only processes the currently loaded response set and does not call scroll helpers automatically.
- [ ] **Step 4: Run the focused tests to verify GREEN**
Run: `python3 -m unittest test_douyin.py -q`
Expected: PASS
### Task 4: Add failing tests for single-video flow
**Files:**
- Modify: `test_douyin.py`
- Modify: `Douyin.py`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- resolving a video URL leads to a single-video target
- resolving an `aweme_id` leads to a single-video target
- single-video flow downloads exactly one file
- [ ] **Step 2: Run the focused tests to verify RED**
Run: `python3 -m unittest test_douyin.py -q`
Expected: FAIL because single-video execution path does not exist yet.
- [ ] **Step 3: Write the minimal implementation**
Implement single-video resolution and a narrow download path that saves one mp4 file.
- [ ] **Step 4: Run the focused tests to verify GREEN**
Run: `python3 -m unittest test_douyin.py -q`
Expected: PASS
### Task 5: Update CLI entry behavior and verify end to end
**Files:**
- Modify: `Douyin.py`
- Modify: `test_douyin.py`
- Modify: `README.md`
- Modify: `externaldocs/beginner-guide.md`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- default CLI invocation with no positional target chooses current-page resolution
- unsupported current page produces a fallback hint
- manual positional target overrides the current page
- [ ] **Step 2: Run the focused tests to verify RED**
Run: `python3 -m unittest test_douyin.py -q`
Expected: FAIL because the CLI still assumes a default hardcoded creator URL.
- [ ] **Step 3: Write the minimal implementation**
Update the parser and `main()` flow so zero-argument execution becomes the default, while keeping the manual positional target as fallback.
- [ ] **Step 4: Update user docs**
Revise `README.md` and `externaldocs/beginner-guide.md` to show the new default flow:
```bash
./.venv/bin/python login_douyin.py
./.venv/bin/python Douyin.py
```
- [ ] **Step 5: Run full verification**
Run: `python3 -m unittest -q`
Expected: PASS for the full test suite.