Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team
3 hours ago
- #open-source
- #speculative-decoding
- #optimization
- EAGLE 3.1 is introduced by the EAGLE team, vLLM team, and TorchSpec team, advancing speculative decoding.
- It addresses performance degradation issues like attention drift in various deployment scenarios.
- Key architectural improvements include FC normalization and feeding post-norm hidden states into next decoding steps.
- EAGLE 3.1 shows better extrapolation, long-context robustness, resilience to variations, and stable acceptance lengths.
- In long-context workloads, it achieves up to 2× longer acceptance length compared to EAGLE 3.
- TorchSpec provides efficient training support for EAGLE 3.1 and future speculative decoding algorithms.
- An open-sourced EAGLE 3.1 draft model for Kimi K2.6 is released, demonstrating deployment with TorchSpec and vLLM.
- vLLM integrates EAGLE 3.1 with features like FC normalization and backward compatibility with existing checkpoints.
- Benchmarking shows significant throughput improvements, such as 2.03× higher output throughput at low concurrency.
- The collaboration highlights open-source efforts across algorithm research, system optimization, and training infrastructure.