Hasty Briefsbeta

Bilingual

Production tests: a guidebook for better systems and more sleep

a year ago
  • #Production Testing
  • #DevOps
  • #Software Reliability
  • Customers expect full site functionality at all times, necessitating near-perfect uptime.
  • Production tests (synthetics) offer immediate failure notifications in production environments.
  • Setting up production tests is quick (within one sprint) and offers high ROI.
  • Atlassian's use of 'pollinators' showcases production tests' value in early problem detection.
  • Production tests are automated, frequent (e.g., every minute), and can emulate user actions via headless browsers or API calls.
  • Tests should be simple, fast (≤30 seconds), and integrate with alerting systems like Slack or paging.
  • They enhance reliability by providing immediate regression warnings, acting as canaries pre-deployment.
  • Design considerations include keeping tests basic to avoid false alarms and ensuring they don't overly impact system resources.
  • Good test examples include login verification and simple CRUD operations; bad examples are overly complex or timing-sensitive checks.
  • Production tests differ from health checks but may overlap; they should not cause false alarms or be overly simplistic.
  • Tests improve observability, especially in low-traffic regions, but may add noise or costs.
  • Fake data and test accounts require careful management to avoid expiration or storage issues.
  • Implementing a 'three strikes' rule for alerts reduces false alarms while maintaining oversight.
  • Pros include real-world testing, quality control, troubleshooting aid, and safer deployments.
  • Cons involve setup challenges, potential flakiness, resource costs, and maintenance efforts.
  • Observability tools complement production tests by monitoring real traffic for issues like latency or failures.
  • Both production tests and observability are recommended for comprehensive monitoring.
  • Regular review and adjustment of production tests ensure continued value as systems evolve.