Mark's Magic Multiply

2 days ago

The post discusses efficient single-precision floating-point multiplication on embedded processors, focusing on custom RISC-V extensions.
It compares different implementations: a 16-cycle version using mul and mulh for optimal correctness, a 33-cycle version with 16-bit multiplies, and Mark Owen's trick using two 32-bit multiplies for a 30-cycle solution.
Mark Owen's method computes a 23x23-bit product via two multiplies, bounds errors, and corrects them efficiently, reducing cycles without sacrificing accuracy.
The author explores hardware trade-offs (e.g., sequential vs. dedicated multipliers) and potential extensions to double-precision multiplication.

Hasty Briefsbeta