douyin-crawler-poc/docs/superpowers/plans/2026-04-17-douyin-login-entry.md

199 lines
6.0 KiB
Markdown

# Douyin Login Entry Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add a dedicated browser-login launcher and a clearer attach-port check so the Douyin crawler has a stable two-step workflow: login first, crawl second.
**Architecture:** Keep browser-launch responsibilities in a new `login_douyin.py` script and keep crawl responsibilities in `Douyin.py`. Add a small socket-based port readiness check before attaching to Chrome, and cover the new behavior with unit tests before implementing production code.
**Tech Stack:** Python 3, `argparse`, `pathlib`, `subprocess`, `socket`, `unittest`
---
### Task 1: Write failing tests for the new login launcher
**Files:**
- Create: `login_douyin.py`
- Create: `test_login_douyin.py`
- [ ] **Step 1: Write the failing test**
```python
def test_build_login_command_uses_expected_chrome_arguments(self) -> None:
module = importlib.import_module("login_douyin")
command = module.build_login_command(
chrome_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
profile_dir=Path("/tmp/douyin-profile"),
browser_port=9223,
user_url="https://www.douyin.com/user/example",
)
self.assertEqual(
command,
[
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"--user-data-dir=/tmp/douyin-profile",
"--remote-debugging-port=9223",
"https://www.douyin.com/user/example",
],
)
```
- [ ] **Step 2: Run test to verify it fails**
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
Expected: FAIL because `login_douyin.py` does not exist yet.
- [ ] **Step 3: Write minimal implementation**
Create `login_douyin.py` with:
- `DEFAULT_CHROME_PATH`
- `DEFAULT_BROWSER_PORT = 9223`
- `DEFAULT_PROFILE_DIR`
- `build_login_command(...)`
- [ ] **Step 4: Run test to verify it passes**
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
Expected: PASS for the command-building test.
- [ ] **Step 5: Commit**
Not applicable here because the workspace is not a git repository.
### Task 2: Add tests and implementation for launcher validation and user guidance
**Files:**
- Modify: `login_douyin.py`
- Modify: `test_login_douyin.py`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- parser defaults use `9223`
- `main()` creates the profile dir
- `main()` prints the follow-up crawl command
- `main()` returns non-zero with a readable message when the Chrome path does not exist
- [ ] **Step 2: Run tests to verify they fail**
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
Expected: FAIL because validation and guidance behavior is not implemented yet.
- [ ] **Step 3: Write minimal implementation**
Add to `login_douyin.py`:
- `build_parser()`
- `launch_browser(...)`
- `main(...)`
- readable `SystemExit`/stderr-style messaging through printed output and return codes
- [ ] **Step 4: Run tests to verify they pass**
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
Expected: PASS
- [ ] **Step 5: Commit**
Not applicable here because the workspace is not a git repository.
### Task 3: Write failing tests for attach-port readiness in the crawler
**Files:**
- Modify: `Douyin.py`
- Modify: `test_douyin.py`
- [ ] **Step 1: Write the failing tests**
Add tests for:
- `ensure_browser_debug_port_ready()` returns successfully when a temporary local server is listening
- `ensure_browser_debug_port_ready()` raises a readable `RuntimeError` when the port is unavailable
- [ ] **Step 2: Run tests to verify they fail**
Run: `./.venv/bin/python -m unittest test_douyin.py -v`
Expected: FAIL because the function does not exist yet.
- [ ] **Step 3: Write minimal implementation**
Add to `Douyin.py`:
- socket-based readiness helper
- call it in `collect_videos()` before `create_page(...)` when `browser_port` is provided
- [ ] **Step 4: Run tests to verify they pass**
Run: `./.venv/bin/python -m unittest test_douyin.py -v`
Expected: PASS
- [ ] **Step 5: Commit**
Not applicable here because the workspace is not a git repository.
### Task 4: Update usage documentation
**Files:**
- Modify: `抖音爬取视频.md`
- [ ] **Step 1: Write the failing doc expectation**
Define the required doc updates:
- explicit step 1 command for `login_douyin.py`
- explicit step 2 command for `Douyin.py --browser-port 9223`
- short note that login state is kept in the dedicated profile dir
- [ ] **Step 2: Verify current doc is incomplete**
Run: `rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md`
Expected: no matches or incomplete guidance
- [ ] **Step 3: Write minimal documentation update**
Append a short “推荐流程” section to `抖音爬取视频.md`.
- [ ] **Step 4: Verify the doc contains the new commands**
Run: `rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md`
Expected: matches for both commands
- [ ] **Step 5: Commit**
Not applicable here because the workspace is not a git repository.
### Task 5: Run full verification
**Files:**
- Modify: `Douyin.py`
- Modify: `login_douyin.py`
- Modify: `test_douyin.py`
- Modify: `test_login_douyin.py`
- Modify: `抖音爬取视频.md`
- [ ] **Step 1: Run the full unit test suite**
Run: `./.venv/bin/python -m unittest test_douyin.py test_login_douyin.py -v`
Expected: all tests pass
- [ ] **Step 2: Run the login launcher manually**
Run: `./.venv/bin/python login_douyin.py --browser-port 9223`
Expected: visible Chrome launches and prints the next crawl command
- [ ] **Step 3: Run the crawler against the logged-in browser**
Run: `./.venv/bin/python Douyin.py --pages 1 --timeout 20 --browser-port 9223`
Expected: videos are downloaded to `video/`
- [ ] **Step 4: Review changed files for scope drift**
Run: `rg --files`
Expected: only the planned files changed or were added
- [ ] **Step 5: Commit**
Not applicable here because the workspace is not a git repository.