Hasty Briefsbeta

Bilingual

I analyzed 571M Amazon reviews to find the most profanity-filled customer rants

6 hours ago
  • #data-processing
  • #amazon-reviews
  • #content-analysis
  • Processed 275 GB of Amazon reviews across 34 categories using a Burla cluster with 1,000 parallel workers.
  • Ranked reviews by seven types of 'unhinged' content including profanity, screaming, punctuation bombs, and rants.
  • Employed rule-based methods without LLMs, using word lists and metrics like caps-ratio and length for classification.
  • Conducted three map-reduce passes to refine results, focusing on hard profanity and censor-aware lexicons.
  • Filtered false positives from proper nouns and idioms, and prioritized angry product rants in the final corpus.
  • Provided an interactive UI with Unhinged Mode to toggle between raw content (with auto-redacted slurs) and sanitized views.
  • Open-source pipeline available on GitHub for reproduction on any Burla cluster in about 15 minutes.