Mark's Magic Multiply
2 days ago
- #embedded-systems
- #floating-point
- #optimization
- The post discusses efficient single-precision floating-point multiplication on embedded processors, focusing on custom RISC-V extensions.
- It compares different implementations: a 16-cycle version using mul and mulh for optimal correctness, a 33-cycle version with 16-bit multiplies, and Mark Owen's trick using two 32-bit multiplies for a 30-cycle solution.
- Mark Owen's method computes a 23x23-bit product via two multiplies, bounds errors, and corrects them efficiently, reducing cycles without sacrificing accuracy.
- The author explores hardware trade-offs (e.g., sequential vs. dedicated multipliers) and potential extensions to double-precision multiplication.