How VictoriaLogs Stores Your Logs in a Columnar Layout
4 days ago
- #Columnar Database
- #Log Storage
- #VictoriaLogs
- VictoriaLogs normalizes logs into a single internal format with a timestamp, named fields, and a stream identity after ingestion via various protocols.
- Logs sharing the same stream fields are treated as a single stream, optimized for compression and query performance by grouping them together on disk.
- Incoming logs are accumulated in memory, flushed into searchable parts about once per second, and sharded across CPU cores to parallelize ingestion.
- Logs are partitioned by UTC date for efficient retention and querying, with each partition containing both in-memory and on-disk parts.
- Parts come in three flavors: in-memory, small, and big, with each being an immutable, self-contained bundle of logs that enhances durability and reduces disk I/O.
- Parts are stored in a columnar layout where each field is kept in its own column, allowing queries to read only relevant columns and compress similar values effectively.
- Inside parts, logs are organized into blocks per stream and sorted by time, with block headers enabling quick skipping of irrelevant blocks during queries.
- Key files in a part include metadata.json, timestamps.bin, column-specific shard files (values.binN, bloom.binN), and index files (index.bin, metaindex.bin) that enable targeted data access.
- Bloom filters provide a fast pre-filtering mechanism by checking if tokens might exist in a block, reducing unnecessary reads of actual log data.
- A two-level indexing system (metaindex.bin and index.bin) allows VictoriaLogs to efficiently locate relevant blocks based on stream and time range without scanning entire datasets.