B-Trees: Why Every Database Uses Them

2 hours ago

Copy Link

B-Trees are fundamental data structures used by databases to efficiently find data on disk, where disk access is much slower than memory access.
Binary Search Trees (BSTs) are inefficient on disk due to high disk I/O costs, as each node access requires a disk seek.
B-Trees solve BST limitations by having high fanout (many children per node), reducing tree height and minimizing disk seeks.
B-Trees are self-balancing, with nodes designed to fit within disk blocks (typically 4KB to 16KB), optimizing for disk I/O.
B-Tree operations (lookup, insert, delete) have logarithmic time complexity, making them efficient for large datasets.
Major databases like MySQL, PostgreSQL, SQLite, and MongoDB use B-Trees (or B+-Trees) for indexing due to their performance benefits.
B-Trees support range queries efficiently through sorted keys and linked leaf nodes.
Trade-offs of B-Trees include write amplification during splits and merges, and memory overhead for caching nodes.
Alternatives like LSM-Trees are better for write-heavy workloads, while hash indexes or skip lists are preferred for in-memory databases.
B-Trees remain dominant for disk-based storage due to their balance of performance, efficiency, and support for range queries.

Hasty Briefsbeta