Collecting All Causal Knowledge

8 days ago

Copy Link

CauseNet aims to create a comprehensive causal knowledge base by collecting and validating causal relations from various web sources.
It includes over 11 million causal relations with an estimated precision of 83%, forming a large-scale, open-domain causality graph.
Three versions of the dataset are available: CauseNet-Full, CauseNet-Precision (higher precision subset), and CauseNet-Sample (small sample for initial exploration).
The data model consists of causal concepts connected by causal relations, each with detailed provenance data indicating the source and extraction method.
Examples of data sources include ClueWeb12 sentences, Wikipedia sentences, lists, and infoboxes, each with specific metadata.
CauseNet can be loaded into Neo4j for graph-based analysis and supports applications like causal reasoning, computational argumentation, and multi-hop question answering.
The project includes concept spotting datasets for training sequence taggers to identify causal concepts in text.
The work is documented in a CIKM 2020 paper, and the data is licensed under Creative Commons Attribution 4.0 International, with code under MIT license.

Hasty Briefsbeta