B-Trees: Why Every Database Uses Them
2 hours ago
- #data-structures
- #database
- #performance
- B-Trees are fundamental data structures used by databases to efficiently find data on disk, where disk access is much slower than memory access.
- Binary Search Trees (BSTs) are inefficient on disk due to high disk I/O costs, as each node access requires a disk seek.
- B-Trees solve BST limitations by having high fanout (many children per node), reducing tree height and minimizing disk seeks.
- B-Trees are self-balancing, with nodes designed to fit within disk blocks (typically 4KB to 16KB), optimizing for disk I/O.
- B-Tree operations (lookup, insert, delete) have logarithmic time complexity, making them efficient for large datasets.
- Major databases like MySQL, PostgreSQL, SQLite, and MongoDB use B-Trees (or B+-Trees) for indexing due to their performance benefits.
- B-Trees support range queries efficiently through sorted keys and linked leaf nodes.
- Trade-offs of B-Trees include write amplification during splits and merges, and memory overhead for caching nodes.
- Alternatives like LSM-Trees are better for write-heavy workloads, while hash indexes or skip lists are preferred for in-memory databases.
- B-Trees remain dominant for disk-based storage due to their balance of performance, efficiency, and support for range queries.