AVX2 is slower than SSE2-4.x under Windows ARM emulation
6 days ago
- #Performance
- #Windows ARM
- #AVX2
- AVX2 emulation on Windows ARM under Prism is slower than SSE2-4.x emulation.
- Benchmark results show AVX2 code runs at 2/3 the performance of SSE2-4.x when emulated on ARM.
- Native AVX2 on Intel hardware is 2.7x faster than SSE2-4.x, but emulated AVX2 on ARM is slower.
- Possible reasons for slower emulation include 128-bit NEON operations vs 256-bit AVX2, new/unoptimized Prism emulation, and lack of optimization for doubles.
- For performance-critical apps, compiling natively for ARM is recommended over relying on x64 emulation.