A peek into Reddit's anti-spam internals
4 hours ago
- #Anti-Samurai Investigation
- #Reddit Spam Detection
- #Internal Moderator Tools
- In 2021, a Reddit moderator accidentally saw internal anti-spam removal reasons due to a temporary Reddit error, revealing details of Reddit's spam detection systems.
- The observed systems include domain removals (banned domains like Tumblr), 'spammit' (percentage-based spam scoring), shadowbans (silent bans), and 'spamurai' (rules and ML-based system).
- Spamurai uses extensive data: Perspective API spam scores, account age, karma, reports, ISP, email domain, user agent, fingerprinting hashes (RHS, TLS), language headers, and referrer info.
- Reddit's anti-spam evolved from older systems (CRM114, Python-based) to newer ones like REV1/spamurai (Lua rules) and REV2/snooron (Flink, image OCR, Python3).
- The Perspective API (Google's spam/toxic detection) was used by Reddit but is shutting down; its spam score is sensitive to minor text changes, allowing potential bypass.
- Some removals target specific strings (e.g., regex bans on 'UA-' Google Analytics IDs) or inspect linked content, and bans can be triggered by suspicious accounts.
- The info is shared in 2026 as it's less risky: Perspective API is ending, and LLMs have changed spam, likely forcing Reddit to overhaul systems.
- The author also notes personal updates (x86css blog post, talks, x3ctf event) and the blog's handmade, minimal web design (46kB gzipped).