xhs_video_crawler/docs/superpowers/plans/2026-05-27-xhs-long-queue-downloader.md
2026-05-27 16:30:06 +08:00

46 lines
2.2 KiB
Markdown

# XHS Long Queue Downloader Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add a resumable JSONL queue mode so long Xiaohongshu video download jobs can target large counts like 1000 videos.
**Architecture:** Keep `XHS.py` as the CLI entry point. Add queue record helpers, source URL helpers, discovery/processing orchestration, and CLI flags while reusing existing parsing, download validation, shared Chrome, and human browsing cadence.
**Tech Stack:** Python 3, unittest, JSONL files, DrissionPage, requests.
---
## File Structure
- Modify `XHS.py`: queue dataclass/helpers, source selection, queue orchestration, CLI flags.
- Modify `test_xhs.py`: queue unit tests and CLI plumbing tests.
- Modify `README.md`: long task command examples.
## Task 1: Queue Persistence
- [ ] Write tests for queue load/save, deduping by note_id, counting downloaded records, and status updates.
- [ ] Run `python3 -m unittest test_xhs.py -v` and verify failures.
- [ ] Implement `QueueRecord`, `load_queue`, `save_queue`, `merge_note_urls_into_queue`, `count_queue_status`.
- [ ] Run tests and verify pass.
## Task 2: Source Selection and CLI
- [ ] Write tests for `build_source_url` and parser defaults for `--source`, `--target-videos`, `--queue-file`, `--retry-limit`.
- [ ] Run tests and verify failures.
- [ ] Implement source URL selection and CLI argument plumbing.
- [ ] Run tests and verify pass.
## Task 3: Queue Processing Orchestration
- [ ] Write tests for pure queue status transitions for success, skipped image, failed retry.
- [ ] Run tests and verify failures.
- [ ] Implement queue processing helpers and wire queue mode into `main` when `--queue-file` or `--target-videos` is provided.
- [ ] Run tests and verify pass.
## Task 4: Docs and Verification
- [ ] Update README with 1000-video queue command and resume behavior.
- [ ] Run `python3 -m unittest test_xhs.py test_login_xhs.py -v`.
- [ ] Run a small smoke command with low target and short waits if browser is available.
- [ ] Commit and push.