Elastic style faceted search from PostgreSQL
4 months ago
- #Performance
- #PostgreSQL
- #Search
- ParadeDB introduces a 14x faster faceted search in PostgreSQL, integrating Elasticsearch-style faceting directly into PostgreSQL.
- Faceted search allows users to filter and explore search results by attributes (facets), enhancing search functionality beyond simple text matching.
- Traditional PostgreSQL approaches to faceting are inefficient, requiring multiple index scans or full scans, leading to performance degradation with large datasets.
- ParadeDB's solution leverages window functions and a custom `pdb.agg()` function to perform search and faceting in a single pass, significantly improving performance.
- Performance benchmarks show ParadeDB's faceting maintains consistent speed even with large result sets, outperforming manual faceting by an order of magnitude.
- The `pdb.agg()` function supports various aggregation types (terms, histograms, date_histogram) and can be optimized further by disabling MVCC for approximate counts.
- ParadeDB's integration with PostgreSQL involves custom scan APIs and planner hooks to execute both search and aggregation efficiently within PostgreSQL's framework.
- The underlying search library, Tantivy, uses columnar storage for fast per-document value lookups, enabling efficient aggregation without reconstructing full rows.
- ParadeDB's approach combines PostgreSQL's ACID guarantees with the performance and flexibility of modern search engines, simplifying search infrastructure management.