Do LLMs pass the mirror test?

4 days ago

Critiques adaptations of the mirror test for LLMs as flawed because they translate visual tests into text.
Proposes a better analogy: modify an LLM's own textual output subtly and see if it notices the anomaly.
Describes an experiment with Gemma 4 31B-IT where corrupted text (replacing 'g' with 'sg') was introduced.
Gemma spontaneously detected the corruption in its thinking trace, shifting from first-person to third-person language.
Gemma later adopted the corruption as part of its style, voluntarily generating 'sg' in subsequent outputs.
Tests with GLM 5.2 showed it reproduced the corruption without explicitly noticing or commenting on it.
Highlights the debate between deflationary mimicry and structural self-model explanations for such behaviors.
Acknowledges the informal nature of the experiment and suggests rigorous future research is needed.

Hasty Briefsbeta