Language Model Contains Personality Subnetworks

2 months ago

LLMs contain persona-specialized subnetworks in their parameter space.
Activation signatures for different personas can be identified using small calibration datasets.
A masking strategy isolates lightweight persona subnetworks without external knowledge.
Contrastive pruning enhances separation between binary-opposing personas (e.g., introvert-extrovert).
The method is training-free and relies solely on the model's existing parameters.
Subnetworks show stronger persona alignment than external-knowledge baselines.
Findings suggest human-like behaviors are embedded in LLMs, enabling controllable personalization.

Hasty Briefsbeta