Hasty Briefsbeta

Bilingual

Can LLMs model real-world systems in TLA+?

11 hours ago
  • #Formal Methods
  • #LLM Evaluation
  • #System Modeling
  • LLMs can produce syntactically correct TLA+ specifications but often fail to model specific system implementations accurately, instead reciting textbook examples.
  • SysMoBench is an automated benchmark that evaluates LLM-generated TLA+ specs through four phases: syntax, runtime, conformance, and invariance, revealing gaps between generic and system-specific modeling.
  • Common failure patterns include specs entering unreachable states or missing reachable states due to mismatched data structures and fused atomic actions, highlighting the need for conformance checks.
  • Transition Validation in SysMoBench provides per-action diagnostics by comparing spec transitions against real system execution traces, pinpointing misalignments.
  • Evaluation results show LLMs perform well on syntax but struggle with conformance and invariants, especially for complex distributed systems, indicating challenges in abstracting real-world logic.
  • Open challenges include trace coverage limitations, state abstraction losses, and generalizability across systems, with ongoing efforts to improve automation and collaboration.