xhs_video_crawler/docs/superpowers/specs/2026-05-27-xhs-search-source-design.md
2026-05-27 16:49:36 +08:00

1.4 KiB

XHS Search Source Design

Goal

Allow the resumable queue downloader to use Xiaohongshu search results as a source, so queries such as 猫咪 or 猫咪 搞笑 can collect and download related video notes.

Scope

This feature reuses the existing manually logged-in Chrome, queue persistence, page card collection, detail-page video extraction, validation, and human browsing cadence. It does not automate login, bypass verification, or call hidden APIs directly.

CLI

./.venv/bin/python XHS.py --source search --keyword 猫咪 --target-videos 100 --queue-file data/search_cat_queue.jsonl

Behavior

  • --source search requires --keyword.
  • The source URL is https://www.xiaohongshu.com/search_result?keyword=<encoded keyword>&source=web_search_result_notes&type=51, which opens the video-filtered search results page.
  • Search result cards are collected from both /explore/<note_id> and tokenized /search_result/<note_id> links.
  • Detail links are polled briefly after navigation because Xiaohongshu search result cards are rendered asynchronously.
  • Queue mode handles videos, images, failures, retries, and resume semantics exactly like other sources.

Testing

Unit tests cover search URL encoding, parser defaults, queue-mode CLI plumbing for keyword, /search_result/ note ID extraction, tokenized search link normalization, and async result-link polling.