xhs_video_crawler/docs/superpowers/plans/2026-05-27-xhs-long-queue-downloader.md
2026-05-27 16:30:06 +08:00

2.2 KiB

XHS Long Queue Downloader Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a resumable JSONL queue mode so long Xiaohongshu video download jobs can target large counts like 1000 videos.

Architecture: Keep XHS.py as the CLI entry point. Add queue record helpers, source URL helpers, discovery/processing orchestration, and CLI flags while reusing existing parsing, download validation, shared Chrome, and human browsing cadence.

Tech Stack: Python 3, unittest, JSONL files, DrissionPage, requests.


File Structure

  • Modify XHS.py: queue dataclass/helpers, source selection, queue orchestration, CLI flags.
  • Modify test_xhs.py: queue unit tests and CLI plumbing tests.
  • Modify README.md: long task command examples.

Task 1: Queue Persistence

  • Write tests for queue load/save, deduping by note_id, counting downloaded records, and status updates.
  • Run python3 -m unittest test_xhs.py -v and verify failures.
  • Implement QueueRecord, load_queue, save_queue, merge_note_urls_into_queue, count_queue_status.
  • Run tests and verify pass.

Task 2: Source Selection and CLI

  • Write tests for build_source_url and parser defaults for --source, --target-videos, --queue-file, --retry-limit.
  • Run tests and verify failures.
  • Implement source URL selection and CLI argument plumbing.
  • Run tests and verify pass.

Task 3: Queue Processing Orchestration

  • Write tests for pure queue status transitions for success, skipped image, failed retry.
  • Run tests and verify failures.
  • Implement queue processing helpers and wire queue mode into main when --queue-file or --target-videos is provided.
  • Run tests and verify pass.

Task 4: Docs and Verification

  • Update README with 1000-video queue command and resume behavior.
  • Run python3 -m unittest test_xhs.py test_login_xhs.py -v.
  • Run a small smoke command with low target and short waits if browser is available.
  • Commit and push.