American Data Centers
5 days ago
- #geospatial
- #data-centers
- #data-analysis
- Business Insider published a video and dataset on America's Data Centers, including locations, ownership, and power/water consumption.
- The dataset includes 1,240 sites with metadata, mapped using diesel generator permits from every US state.
- The author's workstation setup includes a high-performance AMD Ryzen 9 9950X CPU, 96 GB DDR5 RAM, and a fast NVMe SSD.
- Python 3.12.3 and DuckDB were used for data analysis, with extensions like H3, JSON, and Spatial for enhanced functionality.
- The dataset was converted from JavaScript to line-delimited JSON, then to a Parquet file for efficient analysis.
- Example record analysis shows detailed metadata for an Apple hyperscaler site in Arizona, including power consumption and environmental impact metrics.
- Field analysis reveals 99 columns with varying null percentages, unique values, and data types.
- Amazon, Digital Realty, and Equinix are the most common data center brands in the dataset.
- A heatmap visualizes data center locations, with brighter hexagons indicating higher site density.
- Hyperscaler analysis shows Amazon leading with 45 sites, followed by Microsoft and Google.
- OpenStreetMap (OSM) data comparison reveals gaps in data center coverage, with only ~900 building footprints tagged.
- Diesel generator manufacturer analysis identifies Caterpillar and Cummins as the most common brands.
- Potential future work includes AI/OCR on permit PDFs for richer data extraction and modeling.