Hasty Briefsbeta

  • #Reliability
  • #Auth
  • #AWS
  • AWS us-east-1 experienced a major outage on October 20th, impacting DynamoDB DNS and causing widespread service disruptions.
  • Authress maintains high reliability (99.999% SLA) despite AWS outages by implementing multi-region failover and dynamic DNS routing.
  • Critical AWS services like CloudFront, Certificate Manager, and IAM have control planes in us-east-1, making region failures impactful.
  • Authress uses Route 53 health checks and failover routing to switch regions during outages, ensuring minimal downtime.
  • Validation tests and anomaly detection (e.g., Authorization Ratio) help identify and mitigate issues before customers are affected.
  • Incremental rollouts and customer deployment buckets reduce the impact of bugs by limiting exposure during deployments.
  • Authress employs rate limiting and AWS WAF with managed IP reputation lists to prevent resource exhaustion and block malicious traffic.
  • Customer support is integrated directly with engineering to quickly address incidents and reduce resolution time.
  • Infrastructure as Code (IaC) challenges arise when deploying slightly different architectures across primary, backup, and edge regions.
  • Despite robust measures, achieving a 5-nines SLA requires continuous improvement and vigilance against new failure modes.