Hasty Briefsbeta

ClickHouse Full-Text Search

16 days ago
  • #full-text-search
  • #database
  • #performance
  • ClickHouse has completely rebuilt its full-text search (FTS) to be faster, leaner, and fully integrated with its columnar database design.
  • The new FTS implementation uses inverted indexes, finite state transducers (FSTs), and Roaring bitmaps for efficient storage and retrieval.
  • Key improvements include a cleaner API, new tokenizers (e.g., 'split' for semi-structured text), and reduced memory/disk footprint with PFOR and Zstd compression.
  • Bloom filters are now used as pre-filters to reduce I/O and CPU overhead during searches.
  • New search functions, 'searchAny' and 'searchAll', provide more intuitive and flexible querying capabilities.
  • A major optimization allows row-level filtering without reading the text column, improving performance by up to 10x.
  • The FTS is now compatible with ClickHouse Cloud and supports packed part formats.
  • Benchmarks show significant performance gains, especially for frequent tokens, with cold and hot run optimizations.