How when AWS was down, we were not

5 days ago

Copy Link

AWS us-east-1 experienced a major outage on October 20th, impacting DynamoDB DNS and causing widespread service disruptions.
Authress maintains high reliability (99.999% SLA) despite AWS outages by implementing multi-region failover and dynamic DNS routing.
Critical AWS services like CloudFront, Certificate Manager, and IAM have control planes in us-east-1, making region failures impactful.
Authress uses Route 53 health checks and failover routing to switch regions during outages, ensuring minimal downtime.
Validation tests and anomaly detection (e.g., Authorization Ratio) help identify and mitigate issues before customers are affected.
Incremental rollouts and customer deployment buckets reduce the impact of bugs by limiting exposure during deployments.
Authress employs rate limiting and AWS WAF with managed IP reputation lists to prevent resource exhaustion and block malicious traffic.
Customer support is integrated directly with engineering to quickly address incidents and reduce resolution time.
Infrastructure as Code (IaC) challenges arise when deploying slightly different architectures across primary, backup, and edge regions.
Despite robust measures, achieving a 5-nines SLA requires continuous improvement and vigilance against new failure modes.

Hasty Briefsbeta