DuckLake v1.0
2 days ago
- #database
- #lakehouse
- #data-lake
- DuckLake v1.0 is a production-ready lakehouse format specification that stores metadata in a database rather than scattered files in object storage.
- The DuckDB ducklake extension serves as the reference implementation, supporting SQLite, PostgreSQL, and DuckDB as catalogs, and is now among DuckDB's top-10 core extensions.
- Key features in v1.0 include data inlining for small operations, sorted tables for performance, bucket partitioning, geometry and variant type support, and experimental deletion vectors.
- Community adoption includes clients for Apache DataFusion, Apache Spark, Trino, and Pandas, with production use at dozens of companies and a hosted service from MotherDuck.
- Future plans for DuckLake v1.1 include variant inlining and multi-deletion vector puffin files, while v2.0 may focus on Git-like branching, permission-based roles, and incremental materialized views.