28 lines
1.4 KiB
Markdown
28 lines
1.4 KiB
Markdown
# XHS Search Source Design
|
|
|
|
## Goal
|
|
|
|
Allow the resumable queue downloader to use Xiaohongshu search results as a source, so queries such as `猫咪` or `猫咪 搞笑` can collect and download related video notes.
|
|
|
|
## Scope
|
|
|
|
This feature reuses the existing manually logged-in Chrome, queue persistence, page card collection, detail-page video extraction, validation, and human browsing cadence. It does not automate login, bypass verification, or call hidden APIs directly.
|
|
|
|
## CLI
|
|
|
|
```bash
|
|
./.venv/bin/python XHS.py --source search --keyword 猫咪 --target-videos 100 --queue-file data/search_cat_queue.jsonl
|
|
```
|
|
|
|
## Behavior
|
|
|
|
- `--source search` requires `--keyword`.
|
|
- The source URL is `https://www.xiaohongshu.com/search_result?keyword=<encoded keyword>&source=web_search_result_notes&type=51`, which opens the video-filtered search results page.
|
|
- Search result cards are collected from both `/explore/<note_id>` and tokenized `/search_result/<note_id>` links.
|
|
- Detail links are polled briefly after navigation because Xiaohongshu search result cards are rendered asynchronously.
|
|
- Queue mode handles videos, images, failures, retries, and resume semantics exactly like other sources.
|
|
|
|
## Testing
|
|
|
|
Unit tests cover search URL encoding, parser defaults, queue-mode CLI plumbing for keyword, `/search_result/` note ID extraction, tokenized search link normalization, and async result-link polling.
|