Hasty Briefsbeta

Bilingual

Improving performance of original dav1d video decoder

a year ago
  • #performance
  • #memory-alignment
  • #optimization
  • Optimized memory organization in CPU cachelines by reducing structure sizes to 64 bytes or less.
  • Manually aligned enums to strict values to fit into 1 byte and optimized space usage.
  • Compressed 'int' in structures to 'uint16_t' (2 bytes) to reduce memory waste.
  • Used 'pahole' to identify and optimize holes in structures, improving cache efficiency.
  • Achieved performance improvements: ~3% for 1080p and ~1% for 4K.
  • Reduced the size of 'Dav1dFrameContext' from 5648 bytes to 5384 bytes, saving 4 cachelines.
  • Benchmarked using 'hyperfine' on old and new servers, showing consistent performance gains.
  • Highlighted the importance of data alignment and structure optimization for 64-bit processors.
  • Emphasized the practicality of optimizing existing C/C++ projects over rewriting in new languages like Rust.