Hasty Briefsbeta

Bilingual

Show HN: TheorIA – An Open Curated Physics Dataset (Equations,Explanations,JSON)

a year ago
  • #open-data
  • #machine-learning
  • #theoretical-physics
  • TheorIA is a curated, open, high-quality dataset of Theoretical Physics equations and derivations.
  • It addresses the lack of structured datasets for training machine learning models in theoretical physics.
  • Entries include equations, derivations, and explanations in a structured format with AsciiMath annotations.
  • Each entry is crafted and reviewed by individuals with a physics background, with contributor metadata included.
  • The dataset is organized with one entry per file under the 'entries/' folder for easy collaboration.
  • ArXiv-style categories are used for filtering, and a 'manifest.json' tracks versions and updates.
  • The dataset is licensed under CC-BY 4.0, encouraging use, remixing, and teaching.
  • Contributions are welcome via GitHub, with JSON files validated against a schema before merging.
  • The dataset can be used as individual JSON files or merged into a single file for training pipelines.
  • Users are encouraged to cite the dataset and engage via GitHub for issues or collaboration.