Hasty Briefsbeta

Bilingual

Why PostHog rebuilt its data warehouse on DuckDB instead of ClickHouse

8 hours ago
  • #DuckDB
  • #data-warehouse
  • #PostHog
  • The original data warehouse aimed to delay hiring a data engineer for small companies.
  • As companies grew, data engineers found PostHog limited and exported data to external warehouses, making PostHog middleware.
  • Challenges included multi-tenancy issues with ClickHouse clusters not supporting long-running queries and scalability.
  • ClickHouse struggled as a general-purpose warehouse due to no cost-based query optimizer, immature S3/Deltalake/Iceberg support, and query consistency issues across versions.
  • Lack of data tooling like dbt integration and HogQL limitations hindered sophisticated data teams.
  • PostHog rebuilt the warehouse on DuckDB for single-tenant instances, lifecycle management, and Postgres Wire protocol compatibility.
  • Features include DuckHog for local compute and DuckLake for separating storage from compute using S3.
  • The new setup mirrors all PostHog event data to S3 automatically, integrates external data sources, and supports various data tools.
  • A unified warehouse provides complete business context for AI agents, enabling advanced insights and agentic workflows.