Collecting All Causal Knowledge
8 days ago
- #causal knowledge
- #AI research
- #data extraction
- CauseNet aims to create a comprehensive causal knowledge base by collecting and validating causal relations from various web sources.
- It includes over 11 million causal relations with an estimated precision of 83%, forming a large-scale, open-domain causality graph.
- Three versions of the dataset are available: CauseNet-Full, CauseNet-Precision (higher precision subset), and CauseNet-Sample (small sample for initial exploration).
- The data model consists of causal concepts connected by causal relations, each with detailed provenance data indicating the source and extraction method.
- Examples of data sources include ClueWeb12 sentences, Wikipedia sentences, lists, and infoboxes, each with specific metadata.
- CauseNet can be loaded into Neo4j for graph-based analysis and supports applications like causal reasoning, computational argumentation, and multi-hop question answering.
- The project includes concept spotting datasets for training sequence taggers to identify causal concepts in text.
- The work is documented in a CIKM 2020 paper, and the data is licensed under Creative Commons Attribution 4.0 International, with code under MIT license.