Hasty Briefsbeta

Bilingual

Read Locks Are Not Your Friends

a day ago
  • #Rust
  • #Performance
  • #Concurrency
  • RwLock was ~5× slower than Mutex for a read-heavy cache workload due to atomic contention and cache-line ping-pong.
  • The experiment was conducted on Apple Silicon M4 (10 cores, 16GB RAM) using Rust 1.92.0 and parking_lot::RwLock.
  • Even though .read() is called, a write operation occurs at the hardware level to track reader counts, causing cache line ping-pong.
  • Modern CPUs move data in 64-byte chunks called Cache Lines, leading to contention when multiple cores try to modify the same atomic counter.
  • In extremely fast operations like cache lookups, threads spend more time fighting for ownership of the reader-count variable than performing the lookup.
  • A Write Lock is less noisy on the hardware bus as it prevents the stampede of cores trying to modify the same atomic counter simultaneously.
  • Beware of short critical sections; if work inside a lock takes only a few nanoseconds, RwLock overhead may outweigh concurrency benefits.
  • Profile the hardware using tools like perf or cargo-flamegraph to identify cache contention.
  • Consider sharding to split the cache into multiple buckets, reducing lock contention and increasing parallel operations.
  • Read locks are beneficial for larger read sections or when writes are rare, but for extremely small reads, Mutex may be better.