Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

3 hours ago

EAGLE 3.1 is introduced by the EAGLE team, vLLM team, and TorchSpec team, advancing speculative decoding.
It addresses performance degradation issues like attention drift in various deployment scenarios.
Key architectural improvements include FC normalization and feeding post-norm hidden states into next decoding steps.
EAGLE 3.1 shows better extrapolation, long-context robustness, resilience to variations, and stable acceptance lengths.
In long-context workloads, it achieves up to 2× longer acceptance length compared to EAGLE 3.
TorchSpec provides efficient training support for EAGLE 3.1 and future speculative decoding algorithms.
An open-sourced EAGLE 3.1 draft model for Kimi K2.6 is released, demonstrating deployment with TorchSpec and vLLM.
vLLM integrates EAGLE 3.1 with features like FC normalization and backward compatibility with existing checkpoints.
Benchmarking shows significant throughput improvements, such as 2.03× higher output throughput at low concurrency.
The collaboration highlights open-source efforts across algorithm research, system optimization, and training infrastructure.

Hasty Briefsbeta