Hasty Briefsbeta

Bilingual

On CPU Physics and CPU Cycles

6 days ago
  • #memory management
  • #CPU architecture
  • #performance optimization
  • Efficiency in programming promotes deeper problem understanding.
  • Electrical signal speed decreases with physical distance due to parasitic capacitances.
  • CPU cores have pipelined architectures with superscalar ALUs, enabling operations like addition in 1 cycle, multiplication in 3-6 cycles, and division up to 20 cycles.
  • L1 cache is split into L1 Data (L1D) and L1 Instruction (L1I) caches, with L1D reads taking about 3 cycles.
  • Branch mispredictions incur significant penalties (15-25 cycles), and modern CPUs use dynamic branch prediction to mitigate this.
  • [[likely]]/[[unlikely]] attributes can affect branch prediction but are less effective due to dynamic prediction and developer misestimations.
  • TLBs (Translation Lookaside Buffers) handle virtual-to-physical address translations and are critical for performance, though often not problematic for application-level code.
  • Memory access latencies increase with distance from the CPU: L2 cache at 10-15 cycles, L3 at 30-70 cycles, main RAM at 200-300 cycles, and persistent storage (e.g., NVMe SSD) at tens of thousands to millions of cycles.
  • C++ memory storage types include stack (fast, cached), static variables (cached reasonably), heap (uncached unless linear access), and thread-local storage.
  • Network latencies vary widely, from LAN (100,000-500,000 cycles) to global distances (up to hundreds of millions of cycles), with worst-case scenarios potentially infinite.