64 lines
3.1 KiB
Markdown
64 lines
3.1 KiB
Markdown
# Xiaohongshu Browser Feed Download Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Build a first usable Xiaohongshu downloader that attaches to a manually logged-in Chrome session, listens for feed responses, extracts mp4 URLs, and downloads a limited number of videos.
|
|
|
|
**Architecture:** Add `login_xhs.py` for visible Chrome startup and `XHS.py` for browser attachment, feed listening, parsing, scrolling, and downloading. Keep browser/runtime imports lazy so unit tests run without Chrome automation dependencies.
|
|
|
|
**Tech Stack:** Python 3, unittest, requests, DrissionPage, macOS Chrome remote debugging.
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
- Create `login_xhs.py`: Chrome launch CLI, debug-port readiness wait, user-facing next command.
|
|
- Create `XHS.py`: parsing helpers, browser attachment helpers, downloader CLI, feed collection loop.
|
|
- Create `test_login_xhs.py`: command construction and CLI behavior tests.
|
|
- Create `test_xhs.py`: parser, filename, output path, browser address, and port-check tests.
|
|
- Modify `README.md`: install and usage instructions.
|
|
|
|
## Task 1: Login Browser Entrypoint
|
|
|
|
**Files:**
|
|
- Create: `test_login_xhs.py`
|
|
- Create: `login_xhs.py`
|
|
|
|
- [ ] Write failing tests for Chrome command construction, parser defaults, profile creation, and missing Chrome path handling.
|
|
- [ ] Run `python3 -m unittest test_login_xhs.py -v` and verify the tests fail because `login_xhs` does not exist.
|
|
- [ ] Implement `login_xhs.py` with lazy, testable functions matching the Douyin project pattern.
|
|
- [ ] Run `python3 -m unittest test_login_xhs.py -v` and verify it passes.
|
|
|
|
## Task 2: Pure XHS Parsing Helpers
|
|
|
|
**Files:**
|
|
- Create: `test_xhs.py`
|
|
- Create: `XHS.py`
|
|
|
|
- [ ] Write failing tests for filename sanitization, UTF-8 truncation, choosing video URLs, recursive extraction from a nested feed payload, output path generation, browser address construction, and debug-port validation.
|
|
- [ ] Run `python3 -m unittest test_xhs.py -v` and verify the tests fail because `XHS` does not exist.
|
|
- [ ] Implement pure helpers and lazy dependency imports in `XHS.py`.
|
|
- [ ] Run `python3 -m unittest test_xhs.py -v` and verify it passes.
|
|
|
|
## Task 3: Browser Feed Collection CLI
|
|
|
|
**Files:**
|
|
- Modify: `XHS.py`
|
|
- Modify: `test_xhs.py`
|
|
|
|
- [ ] Write failing tests for feed payload extraction from packet response objects and the argument parser defaults.
|
|
- [ ] Run `python3 -m unittest test_xhs.py -v` and verify the new tests fail.
|
|
- [ ] Implement DrissionPage attachment, feed listening, gentle scrolling, download loop, and CLI `main`.
|
|
- [ ] Run `python3 -m unittest test_xhs.py test_login_xhs.py -v` and verify it passes.
|
|
|
|
## Task 4: README and Final Verification
|
|
|
|
**Files:**
|
|
- Modify: `README.md`
|
|
|
|
- [ ] Update README with setup, login, download, output, and compliance notes.
|
|
- [ ] Run `python3 -m unittest test_xhs.py test_login_xhs.py -v`.
|
|
- [ ] Run `git status --short --branch`.
|
|
- [ ] Commit all implementation changes.
|
|
- [ ] Push to `origin/main`.
|