Can LLMs do randomness?

a year ago

LLMs were tested for randomness in coin tosses and number generation.
All models showed a 'heads' bias in coin tosses, with severity varying from 8% to 49%.
Claude 3.7 Sonnet was the least biased in coin tosses (58% heads) and the only model without statistically significant bias.
OpenAI models showed stronger heads bias than Claude in coin tosses.
In number generation, most models had a strong odd number bias, with Claude 3.7 Sonnet showing the strongest (97% odd numbers).
GPT-4.5-preview was perfectly balanced in number generation (50/50 odd/even).
Claude was unbiased in coin tosses but highly biased in number generation.

Hasty Briefsbeta