Hasty Briefsbeta

Bilingual

Decreasing Gitlab repo backup times from 48 hours to 41 minutes

a year ago
  • #Performance
  • #Backup
  • #Git
  • Repository backups are critical for disaster recovery but become challenging as repositories grow.
  • GitLab's Rails repository took 48 hours to back up due to a 15-year-old Git function with O(N²) complexity.
  • The issue was fixed with an algorithmic change, reducing backup times exponentially.
  • Challenges with large repository backups include time-prohibitive processes, resource intensity, and increased failure risk.
  • The root cause was identified as the `object_array_remove_duplicates()` function in Git, which had poor scalability.
  • The fix replaced nested loops with a map data structure, improving performance by 6x in benchmarks.
  • Backup times for GitLab's largest repository dropped from 48 hours to 41 minutes.
  • Benefits include transformed backup strategies, enhanced business continuity, and reduced operational overhead.
  • The fix was contributed upstream to Git, benefiting the broader Git community.
  • GitLab 18.0 includes these improvements, requiring no further configuration.