UCSD: Large Language Models Pass the Turing Test
2 days ago
- #Turing Test
- #Artificial Intelligence
- #Large Language Models
- GPT-4.5 was judged to be human 73% of the time in a Turing test, significantly outperforming real human participants.
- LLaMa-3.1 was judged human 56% of the time, performing similarly to real humans.
- Baseline models ELIZA and GPT-4o performed below chance, with 23% and 21% human judgments respectively.
- This study provides the first empirical evidence that an artificial system can pass a standard three-party Turing test.
- The results have implications for understanding the intelligence of Large Language Models (LLMs) and their potential social and economic impacts.