CedarDB: The fastest database you've never heard of
11 days ago
- #database
- #modern-hardware
- #performance
- CedarDB is a modern database designed from scratch to leverage contemporary hardware, unlike traditional databases like Postgres and MySQL which were built over 30 years ago.
- Originating from the Umbra research project at TUM, CedarDB focuses on performance optimizations for modern multi-core CPUs and large memory capacities.
- Key innovations include an advanced query optimizer capable of unnesting deeply nested SQL statements, reducing execution time from minutes to seconds.
- CedarDB employs code generation for SQL queries, converting them into machine code to eliminate interpretation overhead, significantly speeding up query execution.
- Morsel-driven parallelism ensures efficient CPU core utilization by dynamically distributing small data chunks ('morsels') among cores, keeping all cores busy.
- The database features a modern buffer manager designed for multi-threaded environments, using Pointer Swizzling to avoid global locks and maximize storage bandwidth.
- CedarDB's architecture is built to anticipate future hardware changes, with pluggable interfaces for new storage types and workloads, such as vector databases.
- Adaptive Query Execution allows CedarDB to start executing queries immediately with a basic version while optimizing more complex versions in the background.
- The database supports sophisticated statistics for the optimizer, enabling early data filtering and accurate estimates for operations like GROUP BY and MIN aggregations.
- CedarDB is positioned as a 'beyond main memory' system, optimizing for in-memory speeds but gracefully degrading performance when data exceeds RAM capacity.