GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

10 hours ago

Major AI labs are growing skeptical of scaling model size and training data endlessly.
Despite larger models achieving high benchmark scores, intelligence appears to plateau, as seen with open-weight models nearing proprietary ones.
Larger models, like DeepSeek V4 Pro and GPT-5.5, show high hallucination rates, failing to admit uncertainty or recognize logical flaws.
Increased model size can lead to worse real-world accuracy and truthfulness, despite superior benchmark performance.
The AI industry faces a trilemma: balancing raw capability, uncertainty calibration (hallucination rate), and computational efficiency.

Hasty Briefsbeta