Hasty Briefsbeta

Bilingual

AVX2 is slower than SSE2-4.x under Windows ARM emulation

6 days ago
  • #Performance
  • #Windows ARM
  • #AVX2
  • AVX2 emulation on Windows ARM under Prism is slower than SSE2-4.x emulation.
  • Benchmark results show AVX2 code runs at 2/3 the performance of SSE2-4.x when emulated on ARM.
  • Native AVX2 on Intel hardware is 2.7x faster than SSE2-4.x, but emulated AVX2 on ARM is slower.
  • Possible reasons for slower emulation include 128-bit NEON operations vs 256-bit AVX2, new/unoptimized Prism emulation, and lack of optimization for doubles.
  • For performance-critical apps, compiling natively for ARM is recommended over relying on x64 emulation.