Hasty Briefsbeta

Bilingual

A peek into Reddit's anti-spam internals

4 hours ago
  • #Anti-Samurai Investigation
  • #Reddit Spam Detection
  • #Internal Moderator Tools
  • In 2021, a Reddit moderator accidentally saw internal anti-spam removal reasons due to a temporary Reddit error, revealing details of Reddit's spam detection systems.
  • The observed systems include domain removals (banned domains like Tumblr), 'spammit' (percentage-based spam scoring), shadowbans (silent bans), and 'spamurai' (rules and ML-based system).
  • Spamurai uses extensive data: Perspective API spam scores, account age, karma, reports, ISP, email domain, user agent, fingerprinting hashes (RHS, TLS), language headers, and referrer info.
  • Reddit's anti-spam evolved from older systems (CRM114, Python-based) to newer ones like REV1/spamurai (Lua rules) and REV2/snooron (Flink, image OCR, Python3).
  • The Perspective API (Google's spam/toxic detection) was used by Reddit but is shutting down; its spam score is sensitive to minor text changes, allowing potential bypass.
  • Some removals target specific strings (e.g., regex bans on 'UA-' Google Analytics IDs) or inspect linked content, and bans can be triggered by suspicious accounts.
  • The info is shared in 2026 as it's less risky: Perspective API is ending, and LLMs have changed spam, likely forcing Reddit to overhaul systems.
  • The author also notes personal updates (x86css blog post, talks, x3ctf event) and the blog's handmade, minimal web design (46kB gzipped).