Hasty Briefsbeta

Bilingual

Mark's Magic Multiply

2 days ago
  • #embedded-systems
  • #floating-point
  • #optimization
  • The post discusses efficient single-precision floating-point multiplication on embedded processors, focusing on custom RISC-V extensions.
  • It compares different implementations: a 16-cycle version using mul and mulh for optimal correctness, a 33-cycle version with 16-bit multiplies, and Mark Owen's trick using two 32-bit multiplies for a 30-cycle solution.
  • Mark Owen's method computes a 23x23-bit product via two multiplies, bounds errors, and corrects them efficiently, reducing cycles without sacrificing accuracy.
  • The author explores hardware trade-offs (e.g., sequential vs. dedicated multipliers) and potential extensions to double-precision multiplication.