Hasty Briefsbeta

Bilingual

Railway GCP Account Suspension Incident Report

13 hours ago
  • #outage
  • #cloud-infrastructure
  • #incident-response
  • Railway experienced a platform-wide outage lasting approximately 8 hours on May 19-20, 2026, due to Google Cloud incorrectly suspending their production account.
  • The suspension immediately disrupted GCP-hosted infrastructure, including the dashboard, API, control plane, and databases, causing 503 errors and login failures.
  • As cached network routes expired, the outage cascaded to workloads on Railway Metal and AWS, leading to widespread 404 errors and rendering all regions unreachable.
  • Recovery involved restoring GCP account access, persistent disks, compute instances, and networking, with services gradually restored over several hours.
  • During recovery, GitHub rate-limited Railway's OAuth and webhook integrations due to a surge in retry requests, temporarily blocking logins and builds.
  • Railway takes responsibility for architectural dependencies that allowed a single provider action to cause a full-platform outage and is implementing changes to prevent recurrence.
  • Planned improvements include removing the dependency on GCP for the network control plane, extending high-availability database shards across AWS and Metal, and removing Google Cloud from the data plane's hot path.