Read Locks Are Not Your Friends
a day ago
- #Rust
- #Performance
- #Concurrency
- RwLock was ~5× slower than Mutex for a read-heavy cache workload due to atomic contention and cache-line ping-pong.
- The experiment was conducted on Apple Silicon M4 (10 cores, 16GB RAM) using Rust 1.92.0 and parking_lot::RwLock.
- Even though .read() is called, a write operation occurs at the hardware level to track reader counts, causing cache line ping-pong.
- Modern CPUs move data in 64-byte chunks called Cache Lines, leading to contention when multiple cores try to modify the same atomic counter.
- In extremely fast operations like cache lookups, threads spend more time fighting for ownership of the reader-count variable than performing the lookup.
- A Write Lock is less noisy on the hardware bus as it prevents the stampede of cores trying to modify the same atomic counter simultaneously.
- Beware of short critical sections; if work inside a lock takes only a few nanoseconds, RwLock overhead may outweigh concurrency benefits.
- Profile the hardware using tools like perf or cargo-flamegraph to identify cache contention.
- Consider sharding to split the cache into multiple buckets, reducing lock contention and increasing parallel operations.
- Read locks are beneficial for larger read sections or when writes are rare, but for extremely small reads, Mutex may be better.