Hasty Briefsbeta

Bilingual

Sorted string tables (SST) from first principles

4 months ago
  • #database
  • #storage
  • #performance
  • SSDs and memory: Understanding how data is fetched from SSDs to memory is crucial for database performance.
  • Pages on storage devices: SSDs read data in 4KB pages, making spatial and temporal locality important for minimizing read amplification.
  • Mutability on SSDs: SSDs prefer immutable data structures due to the overhead of rewriting data at the block level.
  • Storing data durably on SSDs: Immutable, batch-aligned data structures like logs and SSTs are ideal for SSDs.
  • Sorted String Tables (SSTs): SSTs store data sorted by key, enabling efficient binary searches and leveraging spatial locality.
  • Data blocks: SST data blocks are sorted by key and encoded for efficient disk storage and retrieval.
  • Indexes: SSTs use indexes to quickly locate data blocks, reducing the need to scan entire files.
  • Filter indexes: Bloom filters and min/max filters help determine if a key exists in an SST before accessing data blocks.
  • Metadata block: The metadata block stores critical information like the locations of index and filter blocks.
  • SSTs as general-purpose building blocks: SSTs can model various data access patterns through clever key encoding.
  • Key encoding strategies: Techniques like key namespacing, composite keys, and secondary indexes enable flexible data modeling with SSTs.
  • Why SSTs matter: SSTs align with SSD and object storage advantages, making them a foundational primitive for many database systems.