Fork Union: Beyond OpenMP in C++ and Rust?
9 days ago
- #parallel-computing
- #thread-pool
- #performance
- Fork Union is a minimal ~300-line C++ and Rust thread-pool library designed to perform within 20% of OpenMP on fork-join workloads.
- OpenMP, while powerful, has limitations in fine-grain parallelism, portability, and meta-programming, leading to the creation of Fork Union.
- Common thread-pool libraries like Taskflow, Rayon, and Tokio introduce performance overheads due to locks, heap allocations, CAS stalls, and false sharing.
- Benchmarks show Fork Union outperforming most thread pools by 10x, though it still trails OpenMP by 20%.
- Key performance optimizations in Fork Union include avoiding mutexes, heap allocations, and CAS operations, and ensuring proper alignment to prevent false sharing.
- Fork Union's API is simple, focusing on fork-join parallelism with methods like `for_each_thread`, `for_each_static`, `for_each_slice`, and `for_each_dynamic`.
- Rust's lack of stable allocator API for containers is a current limitation for its use in HPC/BigData environments.