Hasty Briefsbeta

Bilingual

I scraped 1.94M Airbnb photos for opium dens, pet cameos, and messy kitchens

7 hours ago
  • #Data Analysis
  • #Computer Vision
  • #Airbnb Research
  • Analyzed 1.7M photos and reviews from Inside Airbnb's public listing data across 119 cities using CLIP and Claude Haiku Vision.
  • Parallelized processing on Burla with a dynamic cluster scaling to ~1.7K CPU workers and 20 A100 GPUs for image downloads, embeddings, and validation.
  • Flagged suspicious listings with categories like messy rooms resembling opium dens, chaotic kitchens, pet presence, and TVs mounted too high.
  • Implemented a 3-tier review funnel: regex filtering, embedding clustering, and Claude Haiku validation for the weirdest 12K reviews.
  • Used iterative validation to group listings (e.g., by photo brightness) and accept hypotheses only when groups show no overlap.