Hypergrowth isn’t always easy
5 days ago
- #System Architecture
- #Uptime
- #Tailscale
- Tailscale experienced shakier uptime recently, with incidents documented on their public status page.
- The company emphasizes transparency and continuous improvement in system reliability.
- Tailscale's architecture evolved from a single 'coordination server' to a sharded 'coordination service' for better scalability.
- The system is designed so that existing connections (data plane) remain functional even if the control plane is down.
- Control plane issues affect actions like adding/removing nodes or changing ACLs but not existing connections.
- Tailscale is working on improvements like caching network maps between runs and enhancing the coordination service's resilience.
- Future plans include better multi-tailnet sharing and ongoing software maturity through testing and quality gates.
- The team acknowledges recent outages and commits to reducing their frequency and impact.
- Users are encouraged to report outages and consider joining Tailscale to help improve the system.