Sorted string tables (SST) from first principles
4 months ago
- #database
- #storage
- #performance
- SSDs and memory: Understanding how data is fetched from SSDs to memory is crucial for database performance.
- Pages on storage devices: SSDs read data in 4KB pages, making spatial and temporal locality important for minimizing read amplification.
- Mutability on SSDs: SSDs prefer immutable data structures due to the overhead of rewriting data at the block level.
- Storing data durably on SSDs: Immutable, batch-aligned data structures like logs and SSTs are ideal for SSDs.
- Sorted String Tables (SSTs): SSTs store data sorted by key, enabling efficient binary searches and leveraging spatial locality.
- Data blocks: SST data blocks are sorted by key and encoded for efficient disk storage and retrieval.
- Indexes: SSTs use indexes to quickly locate data blocks, reducing the need to scan entire files.
- Filter indexes: Bloom filters and min/max filters help determine if a key exists in an SST before accessing data blocks.
- Metadata block: The metadata block stores critical information like the locations of index and filter blocks.
- SSTs as general-purpose building blocks: SSTs can model various data access patterns through clever key encoding.
- Key encoding strategies: Techniques like key namespacing, composite keys, and secondary indexes enable flexible data modeling with SSTs.
- Why SSTs matter: SSTs align with SSD and object storage advantages, making them a foundational primitive for many database systems.