xhs_video_crawler/docs/superpowers/plans/2026-05-27-xhs-browser-feed-download.md
2026-05-27 13:59:00 +08:00

3.1 KiB

Xiaohongshu Browser Feed Download Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a first usable Xiaohongshu downloader that attaches to a manually logged-in Chrome session, listens for feed responses, extracts mp4 URLs, and downloads a limited number of videos.

Architecture: Add login_xhs.py for visible Chrome startup and XHS.py for browser attachment, feed listening, parsing, scrolling, and downloading. Keep browser/runtime imports lazy so unit tests run without Chrome automation dependencies.

Tech Stack: Python 3, unittest, requests, DrissionPage, macOS Chrome remote debugging.


File Structure

  • Create login_xhs.py: Chrome launch CLI, debug-port readiness wait, user-facing next command.
  • Create XHS.py: parsing helpers, browser attachment helpers, downloader CLI, feed collection loop.
  • Create test_login_xhs.py: command construction and CLI behavior tests.
  • Create test_xhs.py: parser, filename, output path, browser address, and port-check tests.
  • Modify README.md: install and usage instructions.

Task 1: Login Browser Entrypoint

Files:

  • Create: test_login_xhs.py

  • Create: login_xhs.py

  • Write failing tests for Chrome command construction, parser defaults, profile creation, and missing Chrome path handling.

  • Run python3 -m unittest test_login_xhs.py -v and verify the tests fail because login_xhs does not exist.

  • Implement login_xhs.py with lazy, testable functions matching the Douyin project pattern.

  • Run python3 -m unittest test_login_xhs.py -v and verify it passes.

Task 2: Pure XHS Parsing Helpers

Files:

  • Create: test_xhs.py

  • Create: XHS.py

  • Write failing tests for filename sanitization, UTF-8 truncation, choosing video URLs, recursive extraction from a nested feed payload, output path generation, browser address construction, and debug-port validation.

  • Run python3 -m unittest test_xhs.py -v and verify the tests fail because XHS does not exist.

  • Implement pure helpers and lazy dependency imports in XHS.py.

  • Run python3 -m unittest test_xhs.py -v and verify it passes.

Task 3: Browser Feed Collection CLI

Files:

  • Modify: XHS.py

  • Modify: test_xhs.py

  • Write failing tests for feed payload extraction from packet response objects and the argument parser defaults.

  • Run python3 -m unittest test_xhs.py -v and verify the new tests fail.

  • Implement DrissionPage attachment, feed listening, gentle scrolling, download loop, and CLI main.

  • Run python3 -m unittest test_xhs.py test_login_xhs.py -v and verify it passes.

Task 4: README and Final Verification

Files:

  • Modify: README.md

  • Update README with setup, login, download, output, and compliance notes.

  • Run python3 -m unittest test_xhs.py test_login_xhs.py -v.

  • Run git status --short --branch.

  • Commit all implementation changes.

  • Push to origin/main.