What is Apache Kafka and how does it work?

4 hours ago

Apache Kafka is an open-source, distributed, durable, scalable, fault-tolerant pub/sub messaging system with stream processing and rich integration capabilities.
It uses a log data structure for sequential append-only storage, with topics as logical data separators and partitions for sharding to enable parallelism.
Messages are key-value pairs stored as raw bytes, requiring client-side serialization and deserialization; offsets provide unique ordering within partitions.
Kafka operates as a distributed system with brokers, replication for fault tolerance, and a single-leader model for consistency, using KRaft for consensus.
Features include data retention for replayability, tiered storage for cost efficiency, consumer groups for coordinated reading, and transactions for exactly-once processing.
Extended components include Kafka Streams for stream processing, Kafka Connect for system integrations, and Schema Registry for data structure management.
Use Kafka when needing high durability, availability, read-fanout, or event replay; avoid it for simple async tasks, queue semantics, low latency, or small scale data.

Hasty Briefsbeta