Speculative Speculative Decoding (SSD)
5 hours ago
- #Autoregressive Decoding
- #Machine Learning
- #Inference Acceleration
- Introduces speculative speculative decoding (SSD) to parallelize operations in autoregressive decoding.
- Uses a draft model to predict verification outcomes and prepare speculations pre-emptively.
- Presents Saguaro, an optimized SSD algorithm, achieving up to 2x speedup over speculative decoding baselines.
- Addresses three key challenges in speculative speculative decoding with principled solutions.