A peek into Reddit's anti-spam internals

4 hours ago

In 2021, a Reddit moderator accidentally saw internal anti-spam removal reasons due to a temporary Reddit error, revealing details of Reddit's spam detection systems.
The observed systems include domain removals (banned domains like Tumblr), 'spammit' (percentage-based spam scoring), shadowbans (silent bans), and 'spamurai' (rules and ML-based system).
Spamurai uses extensive data: Perspective API spam scores, account age, karma, reports, ISP, email domain, user agent, fingerprinting hashes (RHS, TLS), language headers, and referrer info.
Reddit's anti-spam evolved from older systems (CRM114, Python-based) to newer ones like REV1/spamurai (Lua rules) and REV2/snooron (Flink, image OCR, Python3).
The Perspective API (Google's spam/toxic detection) was used by Reddit but is shutting down; its spam score is sensitive to minor text changes, allowing potential bypass.
Some removals target specific strings (e.g., regex bans on 'UA-' Google Analytics IDs) or inspect linked content, and bans can be triggered by suspicious accounts.
The info is shared in 2026 as it's less risky: Perspective API is ending, and LLMs have changed spam, likely forcing Reddit to overhaul systems.
The author also notes personal updates (x86css blog post, talks, x3ctf event) and the blog's handmade, minimal web design (46kB gzipped).

Hasty Briefsbeta