Hasty Briefsbeta

Bilingual

Postgres Postmaster does not scale

3 months ago
  • #Performance
  • #Postgres
  • #Scalability
  • Recall.ai processes millions of meetings weekly, automating tasks like meeting notes and CRM updates.
  • Meeting synchronization (starting on the hour or half-hour) impacts media processing infrastructure.
  • High load spikes from meeting starts require immediate compute capacity to avoid data loss.
  • Postgres's postmaster process, a single-threaded loop, became a bottleneck during high connection rates.
  • Delays in postgres connection establishment (10-15s) were traced to postmaster CPU saturation.
  • Investigations revealed the postmaster's fork operations were expensive, especially under high churn.
  • Enabling huge pages in Linux reduced PTE overhead, increasing connection throughput by 20%.
  • Background workers for parallel queries added stress to the postmaster, exacerbating delays.
  • Solutions included adding jitter to EC2 instance startups and reducing parallel query bursts.
  • The postmaster's single-threaded nature is a fundamental bottleneck in high-scale Postgres deployments.