Hasty Briefsbeta

American Data Centers

5 days ago
  • #geospatial
  • #data-centers
  • #data-analysis
  • Business Insider published a video and dataset on America's Data Centers, including locations, ownership, and power/water consumption.
  • The dataset includes 1,240 sites with metadata, mapped using diesel generator permits from every US state.
  • The author's workstation setup includes a high-performance AMD Ryzen 9 9950X CPU, 96 GB DDR5 RAM, and a fast NVMe SSD.
  • Python 3.12.3 and DuckDB were used for data analysis, with extensions like H3, JSON, and Spatial for enhanced functionality.
  • The dataset was converted from JavaScript to line-delimited JSON, then to a Parquet file for efficient analysis.
  • Example record analysis shows detailed metadata for an Apple hyperscaler site in Arizona, including power consumption and environmental impact metrics.
  • Field analysis reveals 99 columns with varying null percentages, unique values, and data types.
  • Amazon, Digital Realty, and Equinix are the most common data center brands in the dataset.
  • A heatmap visualizes data center locations, with brighter hexagons indicating higher site density.
  • Hyperscaler analysis shows Amazon leading with 45 sites, followed by Microsoft and Google.
  • OpenStreetMap (OSM) data comparison reveals gaps in data center coverage, with only ~900 building footprints tagged.
  • Diesel generator manufacturer analysis identifies Caterpillar and Cummins as the most common brands.
  • Potential future work includes AI/OCR on permit PDFs for richer data extraction and modeling.