199 lines
6.0 KiB
Markdown
199 lines
6.0 KiB
Markdown
# Douyin Login Entry Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Add a dedicated browser-login launcher and a clearer attach-port check so the Douyin crawler has a stable two-step workflow: login first, crawl second.
|
|
|
|
**Architecture:** Keep browser-launch responsibilities in a new `login_douyin.py` script and keep crawl responsibilities in `Douyin.py`. Add a small socket-based port readiness check before attaching to Chrome, and cover the new behavior with unit tests before implementing production code.
|
|
|
|
**Tech Stack:** Python 3, `argparse`, `pathlib`, `subprocess`, `socket`, `unittest`
|
|
|
|
---
|
|
|
|
### Task 1: Write failing tests for the new login launcher
|
|
|
|
**Files:**
|
|
- Create: `login_douyin.py`
|
|
- Create: `test_login_douyin.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
def test_build_login_command_uses_expected_chrome_arguments(self) -> None:
|
|
module = importlib.import_module("login_douyin")
|
|
command = module.build_login_command(
|
|
chrome_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
|
|
profile_dir=Path("/tmp/douyin-profile"),
|
|
browser_port=9223,
|
|
user_url="https://www.douyin.com/user/example",
|
|
)
|
|
self.assertEqual(
|
|
command,
|
|
[
|
|
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
|
|
"--user-data-dir=/tmp/douyin-profile",
|
|
"--remote-debugging-port=9223",
|
|
"https://www.douyin.com/user/example",
|
|
],
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 2: Run test to verify it fails**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
|
|
Expected: FAIL because `login_douyin.py` does not exist yet.
|
|
|
|
- [ ] **Step 3: Write minimal implementation**
|
|
|
|
Create `login_douyin.py` with:
|
|
|
|
- `DEFAULT_CHROME_PATH`
|
|
- `DEFAULT_BROWSER_PORT = 9223`
|
|
- `DEFAULT_PROFILE_DIR`
|
|
- `build_login_command(...)`
|
|
|
|
- [ ] **Step 4: Run test to verify it passes**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
|
|
Expected: PASS for the command-building test.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
Not applicable here because the workspace is not a git repository.
|
|
|
|
### Task 2: Add tests and implementation for launcher validation and user guidance
|
|
|
|
**Files:**
|
|
- Modify: `login_douyin.py`
|
|
- Modify: `test_login_douyin.py`
|
|
|
|
- [ ] **Step 1: Write the failing tests**
|
|
|
|
Add tests for:
|
|
|
|
- parser defaults use `9223`
|
|
- `main()` creates the profile dir
|
|
- `main()` prints the follow-up crawl command
|
|
- `main()` returns non-zero with a readable message when the Chrome path does not exist
|
|
|
|
- [ ] **Step 2: Run tests to verify they fail**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
|
|
Expected: FAIL because validation and guidance behavior is not implemented yet.
|
|
|
|
- [ ] **Step 3: Write minimal implementation**
|
|
|
|
Add to `login_douyin.py`:
|
|
|
|
- `build_parser()`
|
|
- `launch_browser(...)`
|
|
- `main(...)`
|
|
- readable `SystemExit`/stderr-style messaging through printed output and return codes
|
|
|
|
- [ ] **Step 4: Run tests to verify they pass**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_login_douyin.py -v`
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
Not applicable here because the workspace is not a git repository.
|
|
|
|
### Task 3: Write failing tests for attach-port readiness in the crawler
|
|
|
|
**Files:**
|
|
- Modify: `Douyin.py`
|
|
- Modify: `test_douyin.py`
|
|
|
|
- [ ] **Step 1: Write the failing tests**
|
|
|
|
Add tests for:
|
|
|
|
- `ensure_browser_debug_port_ready()` returns successfully when a temporary local server is listening
|
|
- `ensure_browser_debug_port_ready()` raises a readable `RuntimeError` when the port is unavailable
|
|
|
|
- [ ] **Step 2: Run tests to verify they fail**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_douyin.py -v`
|
|
Expected: FAIL because the function does not exist yet.
|
|
|
|
- [ ] **Step 3: Write minimal implementation**
|
|
|
|
Add to `Douyin.py`:
|
|
|
|
- socket-based readiness helper
|
|
- call it in `collect_videos()` before `create_page(...)` when `browser_port` is provided
|
|
|
|
- [ ] **Step 4: Run tests to verify they pass**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_douyin.py -v`
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
Not applicable here because the workspace is not a git repository.
|
|
|
|
### Task 4: Update usage documentation
|
|
|
|
**Files:**
|
|
- Modify: `抖音爬取视频.md`
|
|
|
|
- [ ] **Step 1: Write the failing doc expectation**
|
|
|
|
Define the required doc updates:
|
|
|
|
- explicit step 1 command for `login_douyin.py`
|
|
- explicit step 2 command for `Douyin.py --browser-port 9223`
|
|
- short note that login state is kept in the dedicated profile dir
|
|
|
|
- [ ] **Step 2: Verify current doc is incomplete**
|
|
|
|
Run: `rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md`
|
|
Expected: no matches or incomplete guidance
|
|
|
|
- [ ] **Step 3: Write minimal documentation update**
|
|
|
|
Append a short “推荐流程” section to `抖音爬取视频.md`.
|
|
|
|
- [ ] **Step 4: Verify the doc contains the new commands**
|
|
|
|
Run: `rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md`
|
|
Expected: matches for both commands
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
Not applicable here because the workspace is not a git repository.
|
|
|
|
### Task 5: Run full verification
|
|
|
|
**Files:**
|
|
- Modify: `Douyin.py`
|
|
- Modify: `login_douyin.py`
|
|
- Modify: `test_douyin.py`
|
|
- Modify: `test_login_douyin.py`
|
|
- Modify: `抖音爬取视频.md`
|
|
|
|
- [ ] **Step 1: Run the full unit test suite**
|
|
|
|
Run: `./.venv/bin/python -m unittest test_douyin.py test_login_douyin.py -v`
|
|
Expected: all tests pass
|
|
|
|
- [ ] **Step 2: Run the login launcher manually**
|
|
|
|
Run: `./.venv/bin/python login_douyin.py --browser-port 9223`
|
|
Expected: visible Chrome launches and prints the next crawl command
|
|
|
|
- [ ] **Step 3: Run the crawler against the logged-in browser**
|
|
|
|
Run: `./.venv/bin/python Douyin.py --pages 1 --timeout 20 --browser-port 9223`
|
|
Expected: videos are downloaded to `video/`
|
|
|
|
- [ ] **Step 4: Review changed files for scope drift**
|
|
|
|
Run: `rg --files`
|
|
Expected: only the planned files changed or were added
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
Not applicable here because the workspace is not a git repository.
|