GPUPrefixSums – state of the art GPU prefix sum algorithms
13 days ago
- #GPU
- #Parallel Computing
- #Prefix Sum
- GPUPrefixSums provides portable compute shaders for GPU prefix sums, including a novel 'Decoupled Fallback' technique.
- The D3D12 implementation includes a survey of GPU prefix sum algorithms, agnostic of wave size.
- GPUPrefixSums has been benchmarked against Nvidia's CUB library in CUDA.
- Decoupled Fallback allows threadblocks to perform fallback operations if a reduction exceeds spin count, improving performance.
- The project supports multiple implementations: D3D12, CUDA, Unity package, and a barebones version.
- Requirements include Visual Studio 2019+, Windows SDK 10.0.20348.0+, and specific hardware for CUDA.
- Prefix sums are fundamental in parallel computing, used in sorting, compression, and graph traversal.
- The repository includes extensive documentation and references to related research.