I scraped 1.94M Airbnb photos for opium dens, pet cameos, and messy kitchens
5 hours ago
- #Data Analysis
- #Computer Vision
- #Airbnb Research
- Analyzed 1.7M photos and reviews from Inside Airbnb's public listing data across 119 cities using CLIP and Claude Haiku Vision.
- Parallelized processing on Burla with a dynamic cluster scaling to ~1.7K CPU workers and 20 A100 GPUs for image downloads, embeddings, and validation.
- Flagged suspicious listings with categories like messy rooms resembling opium dens, chaotic kitchens, pet presence, and TVs mounted too high.
- Implemented a 3-tier review funnel: regex filtering, embedding clustering, and Claude Haiku validation for the weirdest 12K reviews.
- Used iterative validation to group listings (e.g., by photo brightness) and accept hypotheses only when groups show no overlap.