Hasty Briefsbeta

Fork Union: Beyond OpenMP in C++ and Rust?

9 days ago
  • #parallel-computing
  • #thread-pool
  • #performance
  • Fork Union is a minimal ~300-line C++ and Rust thread-pool library designed to perform within 20% of OpenMP on fork-join workloads.
  • OpenMP, while powerful, has limitations in fine-grain parallelism, portability, and meta-programming, leading to the creation of Fork Union.
  • Common thread-pool libraries like Taskflow, Rayon, and Tokio introduce performance overheads due to locks, heap allocations, CAS stalls, and false sharing.
  • Benchmarks show Fork Union outperforming most thread pools by 10x, though it still trails OpenMP by 20%.
  • Key performance optimizations in Fork Union include avoiding mutexes, heap allocations, and CAS operations, and ensuring proper alignment to prevent false sharing.
  • Fork Union's API is simple, focusing on fork-join parallelism with methods like `for_each_thread`, `for_each_static`, `for_each_slice`, and `for_each_dynamic`.
  • Rust's lack of stable allocator API for containers is a current limitation for its use in HPC/BigData environments.