Why We're Building Stategraph: Terraform State as a Distributed Systems Problem
4 hours ago
- #Distributed Systems
- #Terraform
- #Infrastructure as Code
- Terraform state management is fundamentally a distributed systems problem, not a file storage problem.
- Current Terraform state management uses a global mutex on a JSON file, leading to lock contention and scaling issues.
- State splitting redistributes the problem but doesn't solve it, adding complexity of managing cross-state dependencies.
- Infrastructure state is inherently a directed acyclic graph (DAG) of resources with dependencies.
- Stategraph treats state as a graph, enabling subgraph isolation, precise locking, and incremental refresh.
- Graph-based state allows operations on disjoint subgraphs to be parallelizable, reducing contention.
- Stategraph implements distributed systems principles like MVCC, fine-grained locking, and transaction isolation.
- Refresh operations in graph-based state are scoped to affected subgraphs, reducing unnecessary work.
- Stategraph is implemented as a PostgreSQL schema, normalizing state into resources, dependencies, and transactions.
- Adoption requires no changes to Terraform configurations; Stategraph reads existing tfstate files and constructs the graph automatically.