Hasty Briefsbeta

Bilingual

Gorilla: A fast, scalable, in-memory time series database (2016)

5 days ago
  • #Time Series Database
  • #Facebook Infrastructure
  • #Data Compression
  • Gorilla is an in-memory time series database used by Facebook to monitor system health and performance, enabling quick identification and debugging of production issues.
  • It was designed to handle massive scale: storing 2 billion unique time series, inserting 700 million data points per minute, and serving up to 40,000 queries per second with reads under one millisecond.
  • To fit 26 hours of data in memory, Gorilla employs a novel compression algorithm, achieving an average 12x size reduction by compressing timestamps and values separately based on their predictability.
  • The architecture uses sharding via unique string keys for horizontal scalability, allowing expansion by adding more hosts, and employs a share-nothing design for simplicity and fault tolerance.
  • Gorilla compresses timestamps by calculating deltas and delta-of-deltas, with 96% compressible to a single bit due to regular intervals, and compresses values using XOR comparisons with previous values to minimize storage.
  • Built on Gorilla, tools like a correlation engine (using PPMCC) help automate root-cause analysis by finding correlations between time series, facilitating faster problem diagnosis.
  • Key takeaways from the paper include prioritizing recent data for urgent issues, emphasizing low read latency for advanced tooling, and valuing high availability over resource efficiency.
  • Originally an internal Facebook system, Gorilla has been open-sourced as Beringei on GitHub under Facebook Incubator.