Can LLMs model real-world systems in TLA+?
10 hours ago
- #Formal Methods
- #LLM Evaluation
- #System Modeling
- LLMs can produce syntactically correct TLA+ specifications but often fail to model specific system implementations accurately, instead reciting textbook examples.
- SysMoBench is an automated benchmark that evaluates LLM-generated TLA+ specs through four phases: syntax, runtime, conformance, and invariance, revealing gaps between generic and system-specific modeling.
- Common failure patterns include specs entering unreachable states or missing reachable states due to mismatched data structures and fused atomic actions, highlighting the need for conformance checks.
- Transition Validation in SysMoBench provides per-action diagnostics by comparing spec transitions against real system execution traces, pinpointing misalignments.
- Evaluation results show LLMs perform well on syntax but struggle with conformance and invariants, especially for complex distributed systems, indicating challenges in abstracting real-world logic.
- Open challenges include trace coverage limitations, state abstraction losses, and generalizability across systems, with ongoing efforts to improve automation and collaboration.