Kafka's Broken Promise: There Is No Goldilocks Log
11 hours ago
- #data-logging
- #open-source
- #distributed-systems
- Log systems are categorized into funneling and routing types, each requiring different nonfunctional characteristics that make a single system like Kafka ill-suited for both.
- Kafka excels as a funnel for high-throughput, keyless data transfer but is a poor router due to high read amplification when accessing specific keys within large partitions.
- OpenData Log is designed for routing with key-oriented storage, enabling efficient reads for millions of keys via an LSM tree architecture and object-native durability on storage like S3.
- Log uses segmented LSM trees keyed by (key, sequence) for efficient data expiration and access, scaling horizontally through range-scoped readers without data shuffling during rescaling.
- Operating costs for Log leverage object storage benefits, with estimates around $224/month for a single node handling significant ingestion and read concurrency, though tradeoffs include latency scaling with ingestion rates.