Hasty Briefsbeta

Building a 30 PB storage cluster in the heart of SF

17 hours ago
  • #data-storage
  • #cost-optimization
  • #machine-learning
  • Built a storage cluster in downtown SF to store 90 million hours of video data for pretraining models.
  • Cost savings: $354k/year in-house vs. $12M/year on AWS, a ~40x reduction.
  • Unique data use case: ML training data doesn't need high redundancy or availability like enterprise data.
  • Storage setup: 30PB using 2,400 HDDs in 100 DS4246 chassis, with 10 CPU head nodes.
  • Software: Simple 200-line Rust code for writing, nginx for reading, SQLite for metadata.
  • Cost breakdown: $29.5k/month total (including depreciation) vs. $1.13M/month on AWS.
  • Lessons learned: Simplicity was key; avoided complex solutions like Ceph or MinIO.
  • Challenges: Physical setup (screwing in 2.4k HDDs), networking compatibility, and debugging.
  • Recommendations: Use SAS drives, overprovision network, and ensure good cable management.
  • Future improvements: Higher density setups with 90-drive SuperMicro SuperServers.