How TimescaleDB compresses time-series data
5 hours ago
- #compression
- #timescaledb
- #postgresql
- TimescaleDB achieves up to 98% compression for time-series data using hypercore, a hybrid row-columnar engine with specialized algorithms like delta encoding and run-length encoding.
- Compression in TimescaleDB targets cross-row patterns, unlike PostgreSQL's TOAST which handles individual large values, leading to higher ratios (e.g., 10-100×) for numeric and timestamp columns.
- Data is organized into batches of ~1000 rows in columnar format after compression, optimizing storage and query performance by reducing I/O and enabling vectorized execution.
- Key parameters segmentby and orderby determine row grouping and sorting, crucial for efficient compression and query filtering, with segmentby ideally having 100-10,000 unique values per chunk.
- Compression typically speeds up queries like range scans and aggregations but may slow down point lookups or updates on compressed data.
- Implementation involves configuring compression policies (e.g., using segmentby='machine_id' and orderby='time DESC') and setting automatic compression for older chunks.