Hasty Briefsbeta

GPUPrefixSums – state of the art GPU prefix sum algorithms

13 days ago
  • #GPU
  • #Parallel Computing
  • #Prefix Sum
  • GPUPrefixSums provides portable compute shaders for GPU prefix sums, including a novel 'Decoupled Fallback' technique.
  • The D3D12 implementation includes a survey of GPU prefix sum algorithms, agnostic of wave size.
  • GPUPrefixSums has been benchmarked against Nvidia's CUB library in CUDA.
  • Decoupled Fallback allows threadblocks to perform fallback operations if a reduction exceeds spin count, improving performance.
  • The project supports multiple implementations: D3D12, CUDA, Unity package, and a barebones version.
  • Requirements include Visual Studio 2019+, Windows SDK 10.0.20348.0+, and specific hardware for CUDA.
  • Prefix sums are fundamental in parallel computing, used in sorting, compression, and graph traversal.
  • The repository includes extensive documentation and references to related research.