On CPU Physics and CPU Cycles
6 days ago
- #memory management
- #CPU architecture
- #performance optimization
- Efficiency in programming promotes deeper problem understanding.
- Electrical signal speed decreases with physical distance due to parasitic capacitances.
- CPU cores have pipelined architectures with superscalar ALUs, enabling operations like addition in 1 cycle, multiplication in 3-6 cycles, and division up to 20 cycles.
- L1 cache is split into L1 Data (L1D) and L1 Instruction (L1I) caches, with L1D reads taking about 3 cycles.
- Branch mispredictions incur significant penalties (15-25 cycles), and modern CPUs use dynamic branch prediction to mitigate this.
- [[likely]]/[[unlikely]] attributes can affect branch prediction but are less effective due to dynamic prediction and developer misestimations.
- TLBs (Translation Lookaside Buffers) handle virtual-to-physical address translations and are critical for performance, though often not problematic for application-level code.
- Memory access latencies increase with distance from the CPU: L2 cache at 10-15 cycles, L3 at 30-70 cycles, main RAM at 200-300 cycles, and persistent storage (e.g., NVMe SSD) at tens of thousands to millions of cycles.
- C++ memory storage types include stack (fast, cached), static variables (cached reasonably), heap (uncached unless linear access), and thread-local storage.
- Network latencies vary widely, from LAN (100,000-500,000 cycles) to global distances (up to hundreds of millions of cycles), with worst-case scenarios potentially infinite.