How much slower is random access, really?
10 months ago
- #memory-access
- #benchmarking
- #performance
- Random access performance is significantly impacted by cache locality, with first-to-last order being faster than random order.
- Performance differences become noticeable when arrays exceed the L3 cache size (around 1 million elements).
- First-to-last order averages about 1 nanosecond per element on a MacBook and 0.5 nanoseconds on a Linux desktop.
- Random order is 4x slower on MacBook and 8-16x slower on Linux for arrays fitting in RAM but exceeding L3 cache.
- For arrays larger than RAM, performance degrades sharply, with random order becoming 50x slower on Linux.
- Fisher-Yates shuffle is inefficient for large datasets; a two-pass shuffle is recommended.
- Memory-mapped files do not significantly improve performance for large datasets, with macOS and Linux showing different behaviors.
- Direct file reading can be more efficient than memory-mapping, especially on Linux.