Hasty Briefsbeta

Bilingual

Corrosion

6 months ago
  • #rust
  • #flyio
  • #distributed-systems
  • Fly.io transforms Docker containers into Fly Machines, micro-VMs running globally on their own hardware.
  • State synchronization is the hardest part of their platform, ensuring edge proxies maintain accurate routing tables.
  • A major outage occurred on September 1, 2024, due to a Rust concurrency bug causing a system-wide deadlock.
  • Distributed systems amplify bugs, as seen with Corrosion, their state distribution system.
  • Fly.io's orchestration model differs from mainstream systems by making individual servers the source of truth.
  • Corrosion is a Rust-based, gossip protocol-driven system for global routing without distributed consensus.
  • Corrosion uses SQLite with CRDT extensions (cr-sqlite) for conflict-free updates and efficient state propagation.
  • Past issues with Corrosion include schema changes causing global reconciliation meltdowns and certificate expirations.
  • Improvements include watchdog mechanisms, extensive testing, and regionalization to reduce blast radius.
  • Corrosion avoids traditional distributed consensus models, presenting as a simple, highly distributed SQLite database.