Hasty Briefsbeta

Apple taught an LLM to predict tokens up to 5x faster in math and coding tasks

15 days ago
  • #AI
  • #LLM
  • #Apple
  • Apple's research introduces a 'multi-token prediction' (MTP) framework to speed up LLM responses while maintaining output quality.
  • Traditional LLMs generate text one token at a time, which is slow due to autoregressive decoding.
  • MTP allows models to predict multiple tokens at once using special 'mask' tokens in prompts.
  • The model speculates on upcoming words and verifies them against standard autoregressive decoding, reverting if guesses fail.
  • Testing with Tulu3-8B showed speedups of 2–3× for general tasks and up to 5× for predictable domains like coding and math.
  • No degradation in generation quality was reported, thanks to 'gated LoRA adaptation.'