Cache Conscious Hash Maps
a year ago
- #rust
- #data-structures
- #performance
- The article discusses the process of building a cache-conscious hash map to understand profiling and performance tooling better.
- Linux is recommended for profiling due to its superior tooling like 'perf', while OSX is discouraged because of its restrictive environment and lack of access to hardware counters.
- The hash map implementation involves two main collision handling methods: chaining (using linked lists) and open addressing (finding the next empty slot).
- Open addressing is expected to have better cache performance due to contiguous memory access, but initial profiling shows that the performance gains come from executing fewer instructions rather than better cache utilization.
- Understanding CPU cache hierarchy (L1, L2, L3) and cache lines is crucial for optimizing data structures to minimize cache misses.
- Memory layout significantly impacts performance; compact data structures that fit more elements into cache lines can reduce cache misses and improve execution time.
- Using primitive types like 'u64' instead of 'String' improves cache performance by reducing pointer chasing and fitting more entries into cache lines.
- A compact memory layout variation of open addressing, which separates status bits from data, shows a 50% improvement in cache miss rates and a 1.5x improvement in processing time.
- The conclusion emphasizes that performance optimization is not just about data structure choice but also about how data is laid out in memory and how many elements fit into cache lines.