When is your birthday? The math behind hash collisions
7 hours ago
- #hash collisions
- #birthday paradox
- #probability
- The Birthday Paradox: With just 23 people, there is a 50% chance that at least two share a birthday, calculated via the inverse probability of all unique birthdays.
- Calculating no matches: For n people, P(no matches) = 365! / (365^n × (365-n)!). For n=23, this is about 0.4927, leading to ~50% chance of a match.
- Extended problem: For three out of 60 people sharing a birthday, initial calculations suggested a low probability (~0.0006), but this approach was flawed.
- Richard von Mises revised the perspective: Instead of focusing on a specific day, consider the expected number of days with at least s birthdays across all days.
- Expected value formula: E(x_s) = n × C(k, s) × (1/n)^s × (1 - 1/n)^{k-s}, where n=days, k=people, s=specific birthday count.
- Applying von Mises' formula: For n=365, k=60, s=3, E(x_3) ≈ 0.22, meaning in about 4-5 groups of 60, one will have a triple birthday match, making it less rare.
- Implications: This mathematics underpins hash collisions in hash tables (fields as days, hashes as people) and cybersecurity attacks like the Birthday Attack.
- Birthday Attack: Attackers generate random inputs to find hash collisions, requiring about √n attempts; for SHA-256 with 2^256 outputs, ~2^128 attempts are needed.