Why is calling my asm function from Rust slower than calling it from C?
4 months ago
- #profiling
- #Rust
- #performance
- Identified a specific assembly function, cdef_filter4_pri_edged_8bpc_neon, that was 30% slower in the Rust implementation compared to the C baseline.
- Discovered that the slowdown was due to slower data loading in the Rust version, caused by excessive stack data storage.
- Found that the root cause was the compiler's inability to optimize a Rust abstraction across function pointers.
- Implemented a fix by making the WithOffset struct FFI-safe and restructuring how data is passed across the FFI boundary, reducing the performance gap to within 5% of the C version.
- Used profiling tools like samply and cargo asm to diagnose and verify the performance improvements.