Hasty Briefsbeta

Bilingual

An Update on GitHub Availability

3 hours ago
  • #Incident Response
  • #Scalability
  • #GitHub Availability
  • GitHub apologizes for two recent incidents affecting availability and outlines ongoing reliability improvements.
  • Scale demands increased from 10X capacity to 30X due to rapid growth in agentic workflows and monorepos.
  • Priorities are availability first, then capacity, and new features, focusing on reducing bottlenecks and isolating services.
  • Short-term fixes included migrating webhooks, redesigning caching, and leveraging Azure for more compute.
  • Long-term measures involve moving to a multi-cloud strategy and migrating performance-sensitive code to Go.
  • April 23 incident involved merge queue regression affecting squash merges, impacting 230 repositories and 2,092 pull requests.
  • April 27 incident was a search subsystem overload, likely from a botnet attack, causing UI disruptions but no data loss.
  • GitHub is improving transparency via status updates, incident categorization, and customer reporting channels.
  • Commitment includes enhancing availability, resilience, scalability, and communication for developers.