2.8 KiB
Xiaohongshu Browser Feed Download Design
Goal
Build a first usable Xiaohongshu video downloader that attaches to a manually logged-in Chrome session, listens for official site feed responses, extracts video URLs that the page already received, and downloads a limited number of videos.
Scope
The first version supports https://www.xiaohongshu.com/explore and videos surfaced through feed responses while the visible browser is open. It does not automate login, bypass captcha, generate signatures, replay private APIs directly, or attempt to defeat platform protections.
Architecture
The tool mirrors the existing Douyin project pattern:
login_xhs.pystarts a visible Chrome instance with a fixed profile directory and a remote debugging port.XHS.pyconnects to that existing Chrome through DrissionPage, listens for responses whose URL containsfeed, recursively extracts mp4 URLs such asmaster_urlandbackup_urls, deduplicates them, and downloads videos throughrequests.- Unit tests cover pure parsing, filename, URL choice, and login command construction.
Data Flow
- The user runs
python3 login_xhs.py. - Chrome opens Xiaohongshu Explore with a persistent local profile.
- The user logs in manually and handles any verification.
- The user runs
python3 XHS.py --max-videos 10. XHS.pyattaches to the Chrome debugging port and starts network listening.- The script opens or refreshes Explore, waits for feed packets, extracts video metadata and downloadable mp4 URLs, and writes files to
video/. - The script scrolls gently between waits to trigger more page-loaded feed responses until it downloads the requested limit or reaches empty-response limits.
CLI
python3 login_xhs.py
python3 XHS.py --max-videos 10
python3 XHS.py --browser-port 9334 --max-videos 20 --output-dir video
Error Handling
- If the browser debugging port is closed, print an actionable message pointing to
login_xhs.py. - If optional dependencies are missing, print install commands.
- If no feed data is observed, explain that the user should confirm login, page loading, and scrolling.
- If one video download fails, continue with later videos.
Testing
Use Python unittest without requiring browser dependencies at import time. Tests should not launch Chrome or make network requests.
Coverage targets:
- Safe filename generation and byte truncation.
- Recursive extraction of video candidates from nested Xiaohongshu-like JSON.
- URL selection preference for
master_urland fallback URLs. - Output path generation.
- Browser launch command construction and default CLI values.
Open Constraints
The exact Xiaohongshu response shape may vary. The parser should be tolerant and recursive instead of hard-coding one complete schema from a screenshot.