Hasty Briefsbeta

Bilingual

5× faster fast_blur in image-rs

19 hours ago
  • #image processing
  • #performance optimization
  • #Rust programming
  • The author optimized the 'fast_blur' method in the Rust image crate, achieving up to 5.9x faster performance for images with u8 pixels.
  • Blur algorithms include Gaussian Blur (O(k) per pixel with separation), Box Blur (O(1) per pixel using sliding window), and Fast Almost-Gaussian Filtering (approximates Gaussian using multiple box blurs at O(1) cost).
  • The hot path in 'fast_blur' was dominated by float conversions and operations (roundf, to_f32, min/max), which were replaced with integer arithmetic using u32 accumulators to eliminate these overheads.
  • A 'BlurAccumulator<T>' trait was designed to generically support multiple pixel types (e.g., u8 uses u32, others use f32), enabling compile-time optimization without runtime branches.
  • To avoid expensive integer divisions, a reciprocal multiplication technique (based on Granlund & Montgomery, 1994) was implemented, replacing divisions with faster multiplications and bit-shifts.
  • Combining integer accumulators and reciprocal multiplication resulted in a 5.9x speedup, reducing processing time from ~52ms to ~8ms for a 1920x1080 RGBA image, enabling real-time applications.
  • The optimizations were merged into image-rs and will be available in the next release, with acknowledgments to contributors for arithmetic and design feedback.