Hasty Briefsbeta

Bilingual

Matrix: Post-mortem of the September 2 outage

6 months ago
  • #database-outage
  • #postgresql
  • #disaster-recovery
  • Matrix.org homeserver experienced a 24-hour outage due to a failed database during routine maintenance.
  • Attempts to restore the primary database led to losing the secondary, requiring a lengthy restore from 51TB S3 backups.
  • No data was lost, but the outage lasted from 2025-09-02 17:45 UTC to 2025-09-03 18:00 UTC.
  • The incident highlighted issues with database server naming conventions, backup strategies, and recovery processes.
  • Lessons learned include the need for better safeguards, improved tools, and community communication during outages.