The Netflix Simian Army (2011)
5 hours ago
- #cloud-computing
- #netflix
- #system-reliability
- Netflix's Simian Army is a suite of tools designed to test and improve cloud system reliability by simulating failures.
- Chaos Monkey randomly disables production instances to ensure the system can handle failures without customer impact.
- Other tools include Latency Monkey (simulates service degradation), Conformity Monkey (enforces best practices), Doctor Monkey (detects unhealthy instances), Janitor Monkey (cleans up unused resources), Security Monkey (finds security vulnerabilities), 10–18 Monkey (checks localization issues), and Chaos Gorilla (simulates entire availability zone outages).
- The Simian Army helps Netflix build automatic recovery mechanisms and ensures high availability by constantly testing system resilience.
- Netflix encourages contributions and ideas for new simians to further enhance cloud reliability.