Hasty Briefsbeta

Bilingual

Kafka's Broken Promise: There Is No Goldilocks Log

13 hours ago
  • #data-logging
  • #open-source
  • #distributed-systems
  • Log systems are categorized into funneling and routing types, each requiring different nonfunctional characteristics that make a single system like Kafka ill-suited for both.
  • Kafka excels as a funnel for high-throughput, keyless data transfer but is a poor router due to high read amplification when accessing specific keys within large partitions.
  • OpenData Log is designed for routing with key-oriented storage, enabling efficient reads for millions of keys via an LSM tree architecture and object-native durability on storage like S3.
  • Log uses segmented LSM trees keyed by (key, sequence) for efficient data expiration and access, scaling horizontally through range-scoped readers without data shuffling during rescaling.
  • Operating costs for Log leverage object storage benefits, with estimates around $224/month for a single node handling significant ingestion and read concurrency, though tradeoffs include latency scaling with ingestion rates.