WeSearch

I scraped 1.94M Airbnb photos for opium dens, pet cameos, and messy kitchens

·1 min read · 0 reactions · 0 comments · 5 views
#airbnb#data scraping#ai analysis#image recognition#public datasets#Burla#Inside Airbnb#CLIP#Claude Haiku Vision#A100
⚡ TL;DR · AI summary

A data analysis project scraped 1.94 million Airbnb photos from 119 cities using public data from Inside Airbnb, applying AI models like CLIP and Claude Haiku Vision to identify unusual content such as messy kitchens, pet cameos, and potential opium dens. The processing was conducted on Burla, utilizing a scalable cluster with up to 1.7K CPU workers and 20 A100 GPUs for parallelized image and review analysis. Results were based on quarterly snapshots of listings, reviews, and calendar data, with statistical confidence intervals applied to occupancy rates as a proxy for demand.

Original article
Github
Read full at Github →
Opening excerpt (first ~120 words) tap to expand

Burla demo · April 2026 Every Airbnb,looked at all at once. Every public listing in Inside Airbnb's open dump, 119 cities, 4 quarterly snapshots. We scored 1.7M photos with CLIP (a model that turns an image into a vector you can compare to a text prompt), shortlisted the most suspicious ones, and had Claude Haiku Vision double-check each shortlist. We also scored every review and reranked the weirdest 12K with Haiku. Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster. --Listings --Photos scraped --Reviews scored --CLIP-scored --GPU detections --Peak workers Listings, reviews, and calendars come straight from public Inside Airbnb dumps.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Github