Hasty Briefsbeta

Bilingual

Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

3 hours ago
  • #open-source
  • #speculative-decoding
  • #optimization
  • EAGLE 3.1 is introduced by the EAGLE team, vLLM team, and TorchSpec team, advancing speculative decoding.
  • It addresses performance degradation issues like attention drift in various deployment scenarios.
  • Key architectural improvements include FC normalization and feeding post-norm hidden states into next decoding steps.
  • EAGLE 3.1 shows better extrapolation, long-context robustness, resilience to variations, and stable acceptance lengths.
  • In long-context workloads, it achieves up to 2× longer acceptance length compared to EAGLE 3.
  • TorchSpec provides efficient training support for EAGLE 3.1 and future speculative decoding algorithms.
  • An open-sourced EAGLE 3.1 draft model for Kimi K2.6 is released, demonstrating deployment with TorchSpec and vLLM.
  • vLLM integrates EAGLE 3.1 with features like FC normalization and backward compatibility with existing checkpoints.
  • Benchmarking shows significant throughput improvements, such as 2.03× higher output throughput at low concurrency.
  • The collaboration highlights open-source efforts across algorithm research, system optimization, and training infrastructure.