ClickHouse Full-Text Search
16 days ago
- #full-text-search
- #database
- #performance
- ClickHouse has completely rebuilt its full-text search (FTS) to be faster, leaner, and fully integrated with its columnar database design.
- The new FTS implementation uses inverted indexes, finite state transducers (FSTs), and Roaring bitmaps for efficient storage and retrieval.
- Key improvements include a cleaner API, new tokenizers (e.g., 'split' for semi-structured text), and reduced memory/disk footprint with PFOR and Zstd compression.
- Bloom filters are now used as pre-filters to reduce I/O and CPU overhead during searches.
- New search functions, 'searchAny' and 'searchAll', provide more intuitive and flexible querying capabilities.
- A major optimization allows row-level filtering without reading the text column, improving performance by up to 10x.
- The FTS is now compatible with ClickHouse Cloud and supports packed part formats.
- Benchmarks show significant performance gains, especially for frequent tokens, with cold and hot run optimizations.