Show HN: Using DSPy to enrich a dataset of the Nobel laureate network

9 months ago

Combining data from multiple sources to construct knowledge graphs presents challenges in disambiguating similar-looking entities.
A two-part workflow involving vector embeddings and an 'LLM-as-a-judge' is introduced for entity disambiguation.
DSPy, a declarative framework for building compound AI pipelines, is showcased for its ability to program LLMs without manual prompting.
The workflow is applied to merge datasets of Nobel laureates and their mentorship relationships with enriched data from the Nobel Prize API.
Vector indexing and search in Kuzu are used to find similar entities, followed by LLM-based disambiguation to merge datasets.
The merged data enables answering complex questions about Nobel laureates, their mentors, affiliations, and more.
DSPy's approach eliminates the need for manual prompt writing, focusing instead on declaring intent through programming.
The methodology is cost-effective and scalable, with potential applications in various domains beyond the presented use case.

Hasty Briefsbeta