1.4 KiB
1.4 KiB
XHS Search Source Design
Goal
Allow the resumable queue downloader to use Xiaohongshu search results as a source, so queries such as 猫咪 or 猫咪 搞笑 can collect and download related video notes.
Scope
This feature reuses the existing manually logged-in Chrome, queue persistence, page card collection, detail-page video extraction, validation, and human browsing cadence. It does not automate login, bypass verification, or call hidden APIs directly.
CLI
./.venv/bin/python XHS.py --source search --keyword 猫咪 --target-videos 100 --queue-file data/search_cat_queue.jsonl
Behavior
--source searchrequires--keyword.- The source URL is
https://www.xiaohongshu.com/search_result?keyword=<encoded keyword>&source=web_search_result_notes&type=51, which opens the video-filtered search results page. - Search result cards are collected from both
/explore/<note_id>and tokenized/search_result/<note_id>links. - Detail links are polled briefly after navigation because Xiaohongshu search result cards are rendered asynchronously.
- Queue mode handles videos, images, failures, retries, and resume semantics exactly like other sources.
Testing
Unit tests cover search URL encoding, parser defaults, queue-mode CLI plumbing for keyword, /search_result/ note ID extraction, tokenized search link normalization, and async result-link polling.