A Case Study in Rewriting a Critical Service in Rust
9 days ago
- #Performance Optimization
- #Rust vs Go
- #Cost Savings
- A critical payment service at TikTok, written in Go, became CPU-bound due to high traffic, leading to scalability issues and high operational costs.
- The solution involved rewriting the most CPU-intensive API endpoints in Rust while keeping the rest of the service in Go, leveraging Rust's performance and memory efficiency.
- The Rust implementation was rigorously tested for correctness in shadow mode, ensuring 100% data consistency with the original Go service.
- Stress testing revealed the Rust service handled 2x the traffic of the Go service with lower latency and significantly reduced CPU and memory usage.
- The performance improvements led to a projected annual cost saving of nearly $300,000 by reducing the required compute cores by over 400 vCPU.
- The project highlighted the importance of using the right tool for the job, with Go remaining ideal for most services and Rust for CPU-bound bottlenecks.
- Key metrics showed a 33.6% lower CPU usage, 72% lower memory usage, and 76% lower p99 latency in the Rust service compared to the Go service.