Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

a month ago

RL gains during post-training of large language models are highly concentrated in a small subset of transformer layers.
Training just one transformer layer can recover most, and sometimes even surpass, the performance gains of full-parameter RL training.
High-contribution layers are consistently found in the middle of the transformer stack, while input and output layers contribute less.
This pattern is stable across multiple models, RL algorithms, and task domains like mathematical reasoning and code generation.

Hasty Briefsbeta