Hasty Briefsbeta

Bilingual

Good CTE, Bad CTE

a day ago
  • #query-optimization
  • #CTE
  • #PostgreSQL
  • CTEs (Common Table Expressions) are widely used beyond basic SQL, primarily for their perceived imperative logic, offering a remedy for complex queries by suggesting execution order.
  • Before PostgreSQL 12, CTEs acted as optimization fences, forcing materialization and blocking predicate pushdown, index usage, and other planner optimizations.
  • PostgreSQL 12 introduced CTE inlining, where non-recursive, side-effect-free, singly-referenced CTEs are inlined by default, allowing normal optimizations like predicate pushdown and index usage.
  • CTEs are materialized in cases like multiple references, recursive CTEs, data-modifying statements (INSERT/UPDATE/DELETE), VOLATILE functions (e.g., random(), clock_timestamp()), and FOR UPDATE/SHARE clauses.
  • Materialized CTEs can lead to poor statistics propagation (a 'statistics black hole'), causing inaccurate row estimates and suboptimal join plans, though PostgreSQL 17 improved statistics propagation.
  • Explicit keywords MATERIALIZED and NOT MATERIALIZED allow overriding the planner's inlining decisions, but forcing inlining on multiply-referenced CTEs may cause redundant work.
  • Common pitfalls include using CTEs with GROUP BY or LEFT JOIN for boolean checks, which can block filter pushdown; correlated EXISTS subqueries are often more efficient.
  • Writable CTEs enable atomic data-modifying operations with RETURNING, but changes are not visible to other CTEs reading the same table—only via RETURNING.
  • Recursive CTEs are always materialized and operate iteratively; use UNION ALL for performance (but risk infinite loops in cyclic graphs) or UNION for deduplication and cycle safety.
  • Alternatives to CTEs include temporary tables for large intermediate results needing indexes/statistics, and the ltree extension for hierarchical queries in trees without cycles.