Hasty Briefsbeta

Bilingual

Low-level Haskell: The cursed way to emulate inline assembly in Haskell/GHC, or

5 hours ago
  • #Low-Level Programming
  • #Haskell
  • #FFI
  • Haskell (GHC) lacks inline assembly or intrinsics like C/C++, but techniques exist to use CPU-specific instructions.
  • Multiple methods to return 128-bit products from 64-bit multiplication include C FFI with pointers, unsafe/safe FFI, foreign import prim, and using SIMD registers.
  • Foreign import prim allows low-overhead custom primops via assembly thunks, closely matching performance of GHC's built-in timesWord2# intrinsic.
  • Benchmarks show timesWord2# is fastest (~4.0ns), followed by foreign import prim (~4.5ns); unsafe FFI with double calls is competitive (~5.8ns), while safe FFI is slowest (>60ns).
  • Using unsafe FFI over safe FFI is crucial for performance in short, non-blocking foreign calls to avoid GC blocking and overhead.