Beating the L1 cache with value speculation (2021)

7 days ago

Copy Link

Value speculation leverages the branch predictor to guess values, removing data dependencies and increasing CPU instruction parallelism.
The bottleneck in processing linked lists is often due to L1 cache hits introducing unwanted data dependencies, not misses.
A trick to improve performance involves guessing the next node in a contiguous memory block and correcting if wrong, exploiting branch prediction.
Modern CPUs use branch prediction and caching to execute instructions in parallel, but dependencies can limit this parallelism.
Performance improvements vary with dataset size, showing significant gains when data fits entirely in CPU caches.
Compiler optimizations can undo human-introduced performance tricks, requiring careful coding to preserve them.
Manual assembly optimization can further improve performance by reducing the number of instructions per loop iteration.
The final optimized function achieves near-peak performance by minimizing cache dependency and maximizing instruction-level parallelism.

Hasty Briefsbeta