Pg_lake: Postgres with Iceberg and data lake access
5 months ago
- #Data Lake
- #PostgreSQL
- #Iceberg
- pg_lake integrates Iceberg and data lake files into Postgres, enabling it to function as a standalone lakehouse system.
- Supports transactions and fast queries on Iceberg tables, and works directly with raw data files in object stores like S3.
- Allows creating and modifying Iceberg tables from PostgreSQL with transactional guarantees.
- Enables querying and importing data from Parquet, CSV, JSON, and Iceberg files stored in S3 or compatible object stores.
- Supports exporting query results back to S3 in Parquet, CSV, or JSON formats using COPY commands.
- Reads geospatial formats like GeoJSON and Shapefiles, and supports compression with .gz and .zst.
- Features a built-in map type for semi-structured or key-value data.
- Combines heap, Iceberg, and external files in the same SQL queries with full transactional guarantees.
- Infer table columns and types from external data sources like Iceberg, Parquet, JSON, and CSV files.
- Leverages DuckDB’s query engine for fast execution within Postgres.
- Two setup methods: Docker for easy testing and building from source for manual setup or development.
- Includes PostgreSQL extensions, pgduck_server application, and S3-compatible storage setup.
- pgduck_server is a standalone process using DuckDB to execute queries, accessible via psql on port 5332.
- Supports setting memory limits, init file paths, and cache directories for pgduck_server.
- Relies on DuckDB secrets manager for credentials, with support for AWS and GCP.
- Allows creating Iceberg tables with 'USING iceberg' and querying them directly.
- Supports COPY commands for importing/exporting data in Parquet, CSV, or JSON formats.
- Modular design with components like pg_lake_iceberg, pg_lake_table, pg_lake_copy, and pg_lake_engine.
- Developed by Crunchy Data, later acquired by Snowflake, and open-sourced as pg_lake in 2025.
- Dependent on third-party projects Apache Avro and DuckDB, with patches applied during build.