Hasty Briefsbeta

Bilingual

How Other Link Checkers Do Recursion

3 days ago
  • #link-checkers
  • #web-crawling
  • #recursion-architecture
  • Recursion in link checkers is handled by architecture, not by a clever trick—crawlers are built with cycles from the start, unlike lychee's stream-based DAG.
  • Deduplication must occur at enqueue time before making requests, a key fix for race conditions that lychee previously missed.
  • Termination detection is universally solved with mechanisms like WaitGroup (muffet), joinable queues (LinkChecker), onIdle() promises (linkinator), or drain events (broken-link-checker).
  • Frontier and rate-limiting must be separate components; using a single bounded channel for both causes deadlock.
  • Runtime differences affect ease: Node.js's single-threaded event loop simplifies dedup, Go's goroutines simplify concurrency, while Rust's ownership adds friction.