Low-level Haskell: The cursed way to emulate inline assembly in Haskell/GHC, or
5 hours ago
- #Low-Level Programming
- #Haskell
- #FFI
- Haskell (GHC) lacks inline assembly or intrinsics like C/C++, but techniques exist to use CPU-specific instructions.
- Multiple methods to return 128-bit products from 64-bit multiplication include C FFI with pointers, unsafe/safe FFI, foreign import prim, and using SIMD registers.
- Foreign import prim allows low-overhead custom primops via assembly thunks, closely matching performance of GHC's built-in timesWord2# intrinsic.
- Benchmarks show timesWord2# is fastest (~4.0ns), followed by foreign import prim (~4.5ns); unsafe FFI with double calls is competitive (~5.8ns), while safe FFI is slowest (>60ns).
- Using unsafe FFI over safe FFI is crucial for performance in short, non-blocking foreign calls to avoid GC blocking and overhead.