Read Locks Are Not Your Friends

2 months ago

RwLock was ~5× slower than Mutex for a read-heavy cache workload due to atomic contention and cache-line ping-pong.
The experiment was conducted on Apple Silicon M4 (10 cores, 16GB RAM) using Rust 1.92.0 and parking_lot::RwLock.
Even though .read() is called, a write operation occurs at the hardware level to track reader counts, causing cache line ping-pong.
Modern CPUs move data in 64-byte chunks called Cache Lines, leading to contention when multiple cores try to modify the same atomic counter.
In extremely fast operations like cache lookups, threads spend more time fighting for ownership of the reader-count variable than performing the lookup.
A Write Lock is less noisy on the hardware bus as it prevents the stampede of cores trying to modify the same atomic counter simultaneously.
Beware of short critical sections; if work inside a lock takes only a few nanoseconds, RwLock overhead may outweigh concurrency benefits.
Profile the hardware using tools like perf or cargo-flamegraph to identify cache contention.
Consider sharding to split the cache into multiple buckets, reducing lock contention and increasing parallel operations.
Read locks are beneficial for larger read sections or when writes are rare, but for extremely small reads, Mutex may be better.

Hasty Briefsbeta