Improving performance of original dav1d video decoder
a year ago
- #performance
- #memory-alignment
- #optimization
- Optimized memory organization in CPU cachelines by reducing structure sizes to 64 bytes or less.
- Manually aligned enums to strict values to fit into 1 byte and optimized space usage.
- Compressed 'int' in structures to 'uint16_t' (2 bytes) to reduce memory waste.
- Used 'pahole' to identify and optimize holes in structures, improving cache efficiency.
- Achieved performance improvements: ~3% for 1080p and ~1% for 4K.
- Reduced the size of 'Dav1dFrameContext' from 5648 bytes to 5384 bytes, saving 4 cachelines.
- Benchmarked using 'hyperfine' on old and new servers, showing consistent performance gains.
- Highlighted the importance of data alignment and structure optimization for 64-bit processors.
- Emphasized the practicality of optimizing existing C/C++ projects over rewriting in new languages like Rust.