Nobody ever got fired for using a struct
3 days ago
- #Rust
- #Serialization
- #Performance Optimization
- Structs are commonly used in programming to group related variables, but they can sometimes lead to performance issues.
- A customer reported a performance problem where a new use case processed similar data but ran much slower than existing pipelines.
- Feldera compiles SQL tables into Rust structs, with each row becoming a struct. The example showed a struct with hundreds of optional fields.
- In-memory struct layout in Rust is efficient due to niche optimizations, but serialization can introduce overhead.
- Using rkyv for serialization, the archived struct layout can be less efficient, especially with many optional fields.
- The main issue was that `Option<ArchivedString>` requires an explicit discriminant, increasing the size of serialized data.
- A solution was implemented using a bitmap to track `None` fields, reducing the serialized size by avoiding `Option` overhead.
- For sparse rows, a layout with relative pointers was introduced to skip `NULL` fields entirely, further optimizing storage.
- The optimization reduced the serialized row size by roughly 2x in the problematic workload, improving disk IO and throughput.
- Key takeaway: Rust structs assume most fields exist, while SQL tables assume fields might not exist, requiring tailored optimizations for serialization.