Do LLMs pass the mirror test?
4 days ago
- #Mirror Test
- #LLM Behavior
- #AI Self-Awareness
- Critiques adaptations of the mirror test for LLMs as flawed because they translate visual tests into text.
- Proposes a better analogy: modify an LLM's own textual output subtly and see if it notices the anomaly.
- Describes an experiment with Gemma 4 31B-IT where corrupted text (replacing 'g' with 'sg') was introduced.
- Gemma spontaneously detected the corruption in its thinking trace, shifting from first-person to third-person language.
- Gemma later adopted the corruption as part of its style, voluntarily generating 'sg' in subsequent outputs.
- Tests with GLM 5.2 showed it reproduced the corruption without explicitly noticing or commenting on it.
- Highlights the debate between deflationary mimicry and structural self-model explanations for such behaviors.
- Acknowledges the informal nature of the experiment and suggests rigorous future research is needed.