5× faster fast_blur in image-rs
19 hours ago
- #image processing
- #performance optimization
- #Rust programming
- The author optimized the 'fast_blur' method in the Rust image crate, achieving up to 5.9x faster performance for images with u8 pixels.
- Blur algorithms include Gaussian Blur (O(k) per pixel with separation), Box Blur (O(1) per pixel using sliding window), and Fast Almost-Gaussian Filtering (approximates Gaussian using multiple box blurs at O(1) cost).
- The hot path in 'fast_blur' was dominated by float conversions and operations (roundf, to_f32, min/max), which were replaced with integer arithmetic using u32 accumulators to eliminate these overheads.
- A 'BlurAccumulator<T>' trait was designed to generically support multiple pixel types (e.g., u8 uses u32, others use f32), enabling compile-time optimization without runtime branches.
- To avoid expensive integer divisions, a reciprocal multiplication technique (based on Granlund & Montgomery, 1994) was implemented, replacing divisions with faster multiplications and bit-shifts.
- Combining integer accumulators and reciprocal multiplication resulted in a 5.9x speedup, reducing processing time from ~52ms to ~8ms for a 1920x1080 RGBA image, enabling real-time applications.
- The optimizations were merged into image-rs and will be available in the next release, with acknowledgments to contributors for arithmetic and design feedback.