Hasty Briefsbeta

Turning Billions of Strings into Integers Every Second Without Collisions

5 hours ago
  • #database-design
  • #scalability
  • #distributed-systems
  • Building a Redis RESP3 Wire Compatible Key/Value Database on FoundationDB.
  • Need to represent sets of keys for boolean operations (intersection, union, difference) efficiently.
  • Transition from uint32 to uint64 for larger keyspace due to over 15 billion records in AT Proto Ecosystem.
  • Challenge in interning URIs and User DIDs as uint64 with high throughput (15k to 150k new URIs per second).
  • Initial approach using FoundationDB transactions for UID allocation faced contention issues.
  • First attempt using xxHash for coordination-free hashing rejected due to high collision probability.
  • Second approach: splitting uint64 into ~4.3 billion sequences, each with ~4.3 billion values, to reduce contention.
  • Implementation allows minting 430 billion new UIDs per second with minimal contention.
  • Inspiration from Roaring Bitmaps and load balancing systems for distributed, high-throughput string interning.
  • Author's last day at Bluesky after scaling from 100,000 to 40,000,000 users, moving to new infrastructure projects.