On CPU Physics and CPU Cycles

6 days ago

Efficiency in programming promotes deeper problem understanding.
Electrical signal speed decreases with physical distance due to parasitic capacitances.
CPU cores have pipelined architectures with superscalar ALUs, enabling operations like addition in 1 cycle, multiplication in 3-6 cycles, and division up to 20 cycles.
L1 cache is split into L1 Data (L1D) and L1 Instruction (L1I) caches, with L1D reads taking about 3 cycles.
Branch mispredictions incur significant penalties (15-25 cycles), and modern CPUs use dynamic branch prediction to mitigate this.
[[likely]]/[[unlikely]] attributes can affect branch prediction but are less effective due to dynamic prediction and developer misestimations.
TLBs (Translation Lookaside Buffers) handle virtual-to-physical address translations and are critical for performance, though often not problematic for application-level code.
Memory access latencies increase with distance from the CPU: L2 cache at 10-15 cycles, L3 at 30-70 cycles, main RAM at 200-300 cycles, and persistent storage (e.g., NVMe SSD) at tens of thousands to millions of cycles.
C++ memory storage types include stack (fast, cached), static variables (cached reasonably), heap (uncached unless linear access), and thread-local storage.
Network latencies vary widely, from LAN (100,000-500,000 cycles) to global distances (up to hundreds of millions of cycles), with worst-case scenarios potentially infinite.

Hasty Briefsbeta