Hasty Briefsbeta

Bilingual

Nobody ever got fired for using a struct

3 days ago
  • #Rust
  • #Serialization
  • #Performance Optimization
  • Structs are commonly used in programming to group related variables, but they can sometimes lead to performance issues.
  • A customer reported a performance problem where a new use case processed similar data but ran much slower than existing pipelines.
  • Feldera compiles SQL tables into Rust structs, with each row becoming a struct. The example showed a struct with hundreds of optional fields.
  • In-memory struct layout in Rust is efficient due to niche optimizations, but serialization can introduce overhead.
  • Using rkyv for serialization, the archived struct layout can be less efficient, especially with many optional fields.
  • The main issue was that `Option<ArchivedString>` requires an explicit discriminant, increasing the size of serialized data.
  • A solution was implemented using a bitmap to track `None` fields, reducing the serialized size by avoiding `Option` overhead.
  • For sparse rows, a layout with relative pointers was introduced to skip `NULL` fields entirely, further optimizing storage.
  • The optimization reduced the serialized row size by roughly 2x in the problematic workload, improving disk IO and throughput.
  • Key takeaway: Rust structs assume most fields exist, while SQL tables assume fields might not exist, requiring tailored optimizations for serialization.