Why PostHog rebuilt its data warehouse on DuckDB instead of ClickHouse
8 hours ago
- #DuckDB
- #data-warehouse
- #PostHog
- The original data warehouse aimed to delay hiring a data engineer for small companies.
- As companies grew, data engineers found PostHog limited and exported data to external warehouses, making PostHog middleware.
- Challenges included multi-tenancy issues with ClickHouse clusters not supporting long-running queries and scalability.
- ClickHouse struggled as a general-purpose warehouse due to no cost-based query optimizer, immature S3/Deltalake/Iceberg support, and query consistency issues across versions.
- Lack of data tooling like dbt integration and HogQL limitations hindered sophisticated data teams.
- PostHog rebuilt the warehouse on DuckDB for single-tenant instances, lifecycle management, and Postgres Wire protocol compatibility.
- Features include DuckHog for local compute and DuckLake for separating storage from compute using S3.
- The new setup mirrors all PostHog event data to S3 automatically, integrates external data sources, and supports various data tools.
- A unified warehouse provides complete business context for AI agents, enabling advanced insights and agentic workflows.