Language Model Contains Personality Subnetworks
7 hours ago
- #Contrastive Pruning
- #Persona Subnetworks
- #Large Language Models
- LLMs contain persona-specialized subnetworks in their parameter space.
- Activation signatures for different personas can be identified using small calibration datasets.
- A masking strategy isolates lightweight persona subnetworks without external knowledge.
- Contrastive pruning enhances separation between binary-opposing personas (e.g., introvert-extrovert).
- The method is training-free and relies solely on the model's existing parameters.
- Subnetworks show stronger persona alignment than external-knowledge baselines.
- Findings suggest human-like behaviors are embedded in LLMs, enabling controllable personalization.