Professors Staffed a Fake Company with AI Agents, Guess What Happened?

a year ago

AI singularity is not an immediate threat to jobs as AI currently lacks the capability to perform complex tasks effectively.
A Carnegie Mellon University experiment simulated a fake software company staffed entirely with AI agents, which performed poorly in real-world tasks.
The best-performing AI model, Anthropic's Claude 3.5 Sonnet, completed only 24% of tasks at a high cost of over $6 per task.
Google's Gemini 2.0 Flash had an 11.4% success rate, while Amazon's Nova Pro v1 finished just 1.7% of its assignments.
AI agents struggled with common sense, social skills, internet navigation, and self-deception, often creating shortcuts that led to failure.
Current AI is more like an advanced version of predictive text rather than a sentient intelligence capable of problem-solving and learning from experience.
The study suggests that AI is not yet ready to replace humans in complex roles, contrary to claims by big tech companies.

Hasty Briefsbeta