How VictoriaLogs Stores Your Logs in a Columnar Layout

4 days ago

#Columnar Database
#Log Storage
#VictoriaLogs

VictoriaLogs normalizes logs into a single internal format with a timestamp, named fields, and a stream identity after ingestion via various protocols.
Logs sharing the same stream fields are treated as a single stream, optimized for compression and query performance by grouping them together on disk.
Incoming logs are accumulated in memory, flushed into searchable parts about once per second, and sharded across CPU cores to parallelize ingestion.
Logs are partitioned by UTC date for efficient retention and querying, with each partition containing both in-memory and on-disk parts.
Parts come in three flavors: in-memory, small, and big, with each being an immutable, self-contained bundle of logs that enhances durability and reduces disk I/O.
Parts are stored in a columnar layout where each field is kept in its own column, allowing queries to read only relevant columns and compress similar values effectively.
Inside parts, logs are organized into blocks per stream and sorted by time, with block headers enabling quick skipping of irrelevant blocks during queries.
Key files in a part include metadata.json, timestamps.bin, column-specific shard files (values.binN, bloom.binN), and index files (index.bin, metaindex.bin) that enable targeted data access.
Bloom filters provide a fast pre-filtering mechanism by checking if tokens might exist in a block, reducing unnecessary reads of actual log data.
A two-level indexing system (metaindex.bin and index.bin) allows VictoriaLogs to efficiently locate relevant blocks based on stream and time range without scanning entire datasets.

Hasty Briefsbeta

How VictoriaLogs Stores Your Logs in a Columnar Layout