How does B-tree make your queries fast?
2 days ago
- #database
- #indexing
- #performance
- B-tree is a data structure that enables efficient searching through large amounts of data and is widely used in modern databases.
- B-tree differs from Binary Search Tree (BST) by allowing multiple values per node and being self-balancing, which optimizes performance.
- Hardware considerations like CPU caches, RAM, and disk storage influence the design of B-tree to minimize random access and maximize sequential access.
- Sequential access is significantly faster than random access, especially on HDDs, making B-tree's design advantageous for disk-based storage.
- B-tree reduces tree height by packing multiple values into nodes (pages), which decreases the number of disk reads required for queries.
- Postgres uses 8kB pages, allowing ~375 values per page, enabling efficient storage and retrieval even for large datasets (e.g., 1 billion rows in 4 levels).
- B-tree is self-balancing, ensuring consistent performance during insertions by splitting nodes and promoting values to parent nodes as needed.
- Postgres optimizes splits to minimize free space, enhancing storage efficiency.
- B-tree's design, focusing on hardware efficiency and self-balancing, has made it a durable and effective indexing structure for decades.