How Other Link Checkers Do Recursion
4 days ago
- #link-checkers
- #web-crawling
- #recursion-architecture
- Recursion in link checkers is handled by architecture, not by a clever trick—crawlers are built with cycles from the start, unlike lychee's stream-based DAG.
- Deduplication must occur at enqueue time before making requests, a key fix for race conditions that lychee previously missed.
- Termination detection is universally solved with mechanisms like WaitGroup (muffet), joinable queues (LinkChecker), onIdle() promises (linkinator), or drain events (broken-link-checker).
- Frontier and rate-limiting must be separate components; using a single bounded channel for both causes deadlock.
- Runtime differences affect ease: Node.js's single-threaded event loop simplifies dedup, Go's goroutines simplify concurrency, while Rust's ownership adds friction.