douyin-crawler-poc/docs/superpowers/plans/2026-04-17-douyin-login-entry.md

6.0 KiB

Douyin Login Entry Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a dedicated browser-login launcher and a clearer attach-port check so the Douyin crawler has a stable two-step workflow: login first, crawl second.

Architecture: Keep browser-launch responsibilities in a new login_douyin.py script and keep crawl responsibilities in Douyin.py. Add a small socket-based port readiness check before attaching to Chrome, and cover the new behavior with unit tests before implementing production code.

Tech Stack: Python 3, argparse, pathlib, subprocess, socket, unittest


Task 1: Write failing tests for the new login launcher

Files:

  • Create: login_douyin.py

  • Create: test_login_douyin.py

  • Step 1: Write the failing test

def test_build_login_command_uses_expected_chrome_arguments(self) -> None:
    module = importlib.import_module("login_douyin")
    command = module.build_login_command(
        chrome_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
        profile_dir=Path("/tmp/douyin-profile"),
        browser_port=9223,
        user_url="https://www.douyin.com/user/example",
    )
    self.assertEqual(
        command,
        [
            "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
            "--user-data-dir=/tmp/douyin-profile",
            "--remote-debugging-port=9223",
            "https://www.douyin.com/user/example",
        ],
    )
  • Step 2: Run test to verify it fails

Run: ./.venv/bin/python -m unittest test_login_douyin.py -v Expected: FAIL because login_douyin.py does not exist yet.

  • Step 3: Write minimal implementation

Create login_douyin.py with:

  • DEFAULT_CHROME_PATH

  • DEFAULT_BROWSER_PORT = 9223

  • DEFAULT_PROFILE_DIR

  • build_login_command(...)

  • Step 4: Run test to verify it passes

Run: ./.venv/bin/python -m unittest test_login_douyin.py -v Expected: PASS for the command-building test.

  • Step 5: Commit

Not applicable here because the workspace is not a git repository.

Task 2: Add tests and implementation for launcher validation and user guidance

Files:

  • Modify: login_douyin.py

  • Modify: test_login_douyin.py

  • Step 1: Write the failing tests

Add tests for:

  • parser defaults use 9223

  • main() creates the profile dir

  • main() prints the follow-up crawl command

  • main() returns non-zero with a readable message when the Chrome path does not exist

  • Step 2: Run tests to verify they fail

Run: ./.venv/bin/python -m unittest test_login_douyin.py -v Expected: FAIL because validation and guidance behavior is not implemented yet.

  • Step 3: Write minimal implementation

Add to login_douyin.py:

  • build_parser()

  • launch_browser(...)

  • main(...)

  • readable SystemExit/stderr-style messaging through printed output and return codes

  • Step 4: Run tests to verify they pass

Run: ./.venv/bin/python -m unittest test_login_douyin.py -v Expected: PASS

  • Step 5: Commit

Not applicable here because the workspace is not a git repository.

Task 3: Write failing tests for attach-port readiness in the crawler

Files:

  • Modify: Douyin.py

  • Modify: test_douyin.py

  • Step 1: Write the failing tests

Add tests for:

  • ensure_browser_debug_port_ready() returns successfully when a temporary local server is listening

  • ensure_browser_debug_port_ready() raises a readable RuntimeError when the port is unavailable

  • Step 2: Run tests to verify they fail

Run: ./.venv/bin/python -m unittest test_douyin.py -v Expected: FAIL because the function does not exist yet.

  • Step 3: Write minimal implementation

Add to Douyin.py:

  • socket-based readiness helper

  • call it in collect_videos() before create_page(...) when browser_port is provided

  • Step 4: Run tests to verify they pass

Run: ./.venv/bin/python -m unittest test_douyin.py -v Expected: PASS

  • Step 5: Commit

Not applicable here because the workspace is not a git repository.

Task 4: Update usage documentation

Files:

  • Modify: 抖音爬取视频.md

  • Step 1: Write the failing doc expectation

Define the required doc updates:

  • explicit step 1 command for login_douyin.py

  • explicit step 2 command for Douyin.py --browser-port 9223

  • short note that login state is kept in the dedicated profile dir

  • Step 2: Verify current doc is incomplete

Run: rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md Expected: no matches or incomplete guidance

  • Step 3: Write minimal documentation update

Append a short “推荐流程” section to 抖音爬取视频.md.

  • Step 4: Verify the doc contains the new commands

Run: rg -n "login_douyin.py|--browser-port 9223" 抖音爬取视频.md Expected: matches for both commands

  • Step 5: Commit

Not applicable here because the workspace is not a git repository.

Task 5: Run full verification

Files:

  • Modify: Douyin.py

  • Modify: login_douyin.py

  • Modify: test_douyin.py

  • Modify: test_login_douyin.py

  • Modify: 抖音爬取视频.md

  • Step 1: Run the full unit test suite

Run: ./.venv/bin/python -m unittest test_douyin.py test_login_douyin.py -v Expected: all tests pass

  • Step 2: Run the login launcher manually

Run: ./.venv/bin/python login_douyin.py --browser-port 9223 Expected: visible Chrome launches and prints the next crawl command

  • Step 3: Run the crawler against the logged-in browser

Run: ./.venv/bin/python Douyin.py --pages 1 --timeout 20 --browser-port 9223 Expected: videos are downloaded to video/

  • Step 4: Review changed files for scope drift

Run: rg --files Expected: only the planned files changed or were added

  • Step 5: Commit

Not applicable here because the workspace is not a git repository.