CedarDB: The fastest database you've never heard of

3 months ago

#database
#modern-hardware
#performance

CedarDB is a modern database designed from scratch to leverage contemporary hardware, unlike traditional databases like Postgres and MySQL which were built over 30 years ago.
Originating from the Umbra research project at TUM, CedarDB focuses on performance optimizations for modern multi-core CPUs and large memory capacities.
Key innovations include an advanced query optimizer capable of unnesting deeply nested SQL statements, reducing execution time from minutes to seconds.
CedarDB employs code generation for SQL queries, converting them into machine code to eliminate interpretation overhead, significantly speeding up query execution.
Morsel-driven parallelism ensures efficient CPU core utilization by dynamically distributing small data chunks ('morsels') among cores, keeping all cores busy.
The database features a modern buffer manager designed for multi-threaded environments, using Pointer Swizzling to avoid global locks and maximize storage bandwidth.
CedarDB's architecture is built to anticipate future hardware changes, with pluggable interfaces for new storage types and workloads, such as vector databases.
Adaptive Query Execution allows CedarDB to start executing queries immediately with a basic version while optimizing more complex versions in the background.
The database supports sophisticated statistics for the optimizer, enabling early data filtering and accurate estimates for operations like GROUP BY and MIN aggregations.
CedarDB is positioned as a 'beyond main memory' system, optimizing for in-memory speeds but gracefully degrading performance when data exceeds RAM capacity.

Hasty Briefsbeta

CedarDB: The fastest database you've never heard of