Decreasing Gitlab repo backup times from 48 hours to 41 minutes
a year ago
- #Performance
- #Backup
- #Git
- Repository backups are critical for disaster recovery but become challenging as repositories grow.
- GitLab's Rails repository took 48 hours to back up due to a 15-year-old Git function with O(N²) complexity.
- The issue was fixed with an algorithmic change, reducing backup times exponentially.
- Challenges with large repository backups include time-prohibitive processes, resource intensity, and increased failure risk.
- The root cause was identified as the `object_array_remove_duplicates()` function in Git, which had poor scalability.
- The fix replaced nested loops with a map data structure, improving performance by 6x in benchmarks.
- Backup times for GitLab's largest repository dropped from 48 hours to 41 minutes.
- Benefits include transformed backup strategies, enhanced business continuity, and reduced operational overhead.
- The fix was contributed upstream to Git, benefiting the broader Git community.
- GitLab 18.0 includes these improvements, requiring no further configuration.