Multi-Core by Default

9 hours ago

https://www.rfleury.com/p/multi-core-by-default

Copy Link

#multi-core-programming
#parallel-computing
#performance-optimization

Multi-core programming should be the default approach, not a special case, to leverage modern hardware capabilities.
Traditional single-core programming is complex, and adding multi-core programming on top increases complexity significantly.
Modern CPUs have multiple cores (8, 16, 32, 64), and ignoring multi-core programming leaves significant performance untapped.
The author initially avoided multi-core programming due to perceived complexity but later recognized its necessity for performance.
Multi-core programming introduces challenges like synchronization, debugging, and control flow scattering across threads.
A 'parallel for' loop is a common technique to distribute work across cores, but it has flaws like overhead and complexity.
Job systems can mitigate some overhead but still introduce complexity in setup, debugging, and maintenance.
GPU shader programming is multi-core by default, offering high performance with minimal programmer overhead, unlike CPU programming.
The author proposes a 'multi-core by default' approach, where code is written assuming multiple cores, with narrow (single-core) sections as needed.
This approach simplifies debugging, maintains full call stacks, and avoids the complexity of traditional job systems.
Key concepts include thread-local group data (LaneIdx, LaneCount, LaneSync), uniformly distributing work (LaneRange), and broadcasting data across lanes (LaneSyncU64).
The summation example demonstrates how to distribute work, combine results, and handle inputs/outputs in a multi-core context.
Non-uniform work distributions can be managed with dynamic task assignment or algorithm redesign (e.g., radix sort instead of comparison sort).
Multi-core by default is a strict superset of single-core programming, as it can parameterize down to single-core execution.
This approach is particularly useful for game engines, where heterogeneous timelines (e.g., rendering, input) require careful synchronization.
The author acknowledges that not all problems fit this model but argues it simplifies many multi-core scenarios.

Hasty Briefsbeta

Multi-Core by Default