Show HN: TheorIA – An Open Curated Physics Dataset (Equations,Explanations,JSON)
a year ago
- #open-data
- #machine-learning
- #theoretical-physics
- TheorIA is a curated, open, high-quality dataset of Theoretical Physics equations and derivations.
- It addresses the lack of structured datasets for training machine learning models in theoretical physics.
- Entries include equations, derivations, and explanations in a structured format with AsciiMath annotations.
- Each entry is crafted and reviewed by individuals with a physics background, with contributor metadata included.
- The dataset is organized with one entry per file under the 'entries/' folder for easy collaboration.
- ArXiv-style categories are used for filtering, and a 'manifest.json' tracks versions and updates.
- The dataset is licensed under CC-BY 4.0, encouraging use, remixing, and teaching.
- Contributions are welcome via GitHub, with JSON files validated against a schema before merging.
- The dataset can be used as individual JSON files or merged into a single file for training pipelines.
- Users are encouraged to cite the dataset and engage via GitHub for issues or collaboration.