Good CTE, Bad CTE
a day ago
- #query-optimization
- #CTE
- #PostgreSQL
- CTEs (Common Table Expressions) are widely used beyond basic SQL, primarily for their perceived imperative logic, offering a remedy for complex queries by suggesting execution order.
- Before PostgreSQL 12, CTEs acted as optimization fences, forcing materialization and blocking predicate pushdown, index usage, and other planner optimizations.
- PostgreSQL 12 introduced CTE inlining, where non-recursive, side-effect-free, singly-referenced CTEs are inlined by default, allowing normal optimizations like predicate pushdown and index usage.
- CTEs are materialized in cases like multiple references, recursive CTEs, data-modifying statements (INSERT/UPDATE/DELETE), VOLATILE functions (e.g., random(), clock_timestamp()), and FOR UPDATE/SHARE clauses.
- Materialized CTEs can lead to poor statistics propagation (a 'statistics black hole'), causing inaccurate row estimates and suboptimal join plans, though PostgreSQL 17 improved statistics propagation.
- Explicit keywords MATERIALIZED and NOT MATERIALIZED allow overriding the planner's inlining decisions, but forcing inlining on multiply-referenced CTEs may cause redundant work.
- Common pitfalls include using CTEs with GROUP BY or LEFT JOIN for boolean checks, which can block filter pushdown; correlated EXISTS subqueries are often more efficient.
- Writable CTEs enable atomic data-modifying operations with RETURNING, but changes are not visible to other CTEs reading the same table—only via RETURNING.
- Recursive CTEs are always materialized and operate iteratively; use UNION ALL for performance (but risk infinite loops in cyclic graphs) or UNION for deduplication and cycle safety.
- Alternatives to CTEs include temporary tables for large intermediate results needing indexes/statistics, and the ltree extension for hierarchical queries in trees without cycles.