Turning Billions of Strings into Integers Every Second Without Collisions
5 hours ago
- #database-design
- #scalability
- #distributed-systems
- Building a Redis RESP3 Wire Compatible Key/Value Database on FoundationDB.
- Need to represent sets of keys for boolean operations (intersection, union, difference) efficiently.
- Transition from uint32 to uint64 for larger keyspace due to over 15 billion records in AT Proto Ecosystem.
- Challenge in interning URIs and User DIDs as uint64 with high throughput (15k to 150k new URIs per second).
- Initial approach using FoundationDB transactions for UID allocation faced contention issues.
- First attempt using xxHash for coordination-free hashing rejected due to high collision probability.
- Second approach: splitting uint64 into ~4.3 billion sequences, each with ~4.3 billion values, to reduce contention.
- Implementation allows minting 430 billion new UIDs per second with minimal contention.
- Inspiration from Roaring Bitmaps and load balancing systems for distributed, high-throughput string interning.
- Author's last day at Bluesky after scaling from 100,000 to 40,000,000 users, moving to new infrastructure projects.