µPC: Scaling Predictive Coding to 100 Layer Networks

a year ago

The paper introduces $μ$PC, a method to scale predictive coding (PC) to deep networks with 100+ layers.
It addresses the challenge of training deep PC networks (PCNs) by using a Depth-$μ$P parameterisation.
The study reveals pathologies in standard PCNs that hinder training at large depths and shows how $μ$PC mitigates some of these issues.
$μ$PC enables stable training of deep residual networks (up to 128 layers) with competitive performance on classification tasks.
The method allows zero-shot transfer of learning rates across different network widths and depths.
The findings have implications for other local learning algorithms and potential extensions to convolutional and transformer architectures.
Code for $μ$PC is made available as part of a JAX library for PCNs.

Hasty Briefsbeta