Hasty Briefsbeta

B-Trees: Why Every Database Uses Them

2 hours ago
  • #data-structures
  • #database
  • #performance
  • B-Trees are fundamental data structures used by databases to efficiently find data on disk, where disk access is much slower than memory access.
  • Binary Search Trees (BSTs) are inefficient on disk due to high disk I/O costs, as each node access requires a disk seek.
  • B-Trees solve BST limitations by having high fanout (many children per node), reducing tree height and minimizing disk seeks.
  • B-Trees are self-balancing, with nodes designed to fit within disk blocks (typically 4KB to 16KB), optimizing for disk I/O.
  • B-Tree operations (lookup, insert, delete) have logarithmic time complexity, making them efficient for large datasets.
  • Major databases like MySQL, PostgreSQL, SQLite, and MongoDB use B-Trees (or B+-Trees) for indexing due to their performance benefits.
  • B-Trees support range queries efficiently through sorted keys and linked leaf nodes.
  • Trade-offs of B-Trees include write amplification during splits and merges, and memory overhead for caching nodes.
  • Alternatives like LSM-Trees are better for write-heavy workloads, while hash indexes or skip lists are preferred for in-memory databases.
  • B-Trees remain dominant for disk-based storage due to their balance of performance, efficiency, and support for range queries.