Hasty Briefsbeta

Bilingual

Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

7 hours ago
  • #transformer-architecture
  • #reinforcement-learning
  • #model-efficiency
  • RL gains during post-training of large language models are highly concentrated in a small subset of transformer layers.
  • Training just one transformer layer can recover most, and sometimes even surpass, the performance gains of full-parameter RL training.
  • High-contribution layers are consistently found in the middle of the transformer stack, while input and output layers contribute less.
  • This pattern is stable across multiple models, RL algorithms, and task domains like mathematical reasoning and code generation.