- Combining data from multiple sources to construct knowledge graphs presents challenges in disambiguating similar-looking entities.
- A two-part workflow involving vector embeddings and an 'LLM-as-a-judge' is introduced for entity disambiguation.
- DSPy, a declarative framework for building compound AI pipelines, is showcased for its ability to program LLMs without manual prompting.
- The workflow is applied to merge datasets of Nobel laureates and their mentorship relationships with enriched data from the Nobel Prize API.
- Vector indexing and search in Kuzu are used to find similar entities, followed by LLM-based disambiguation to merge datasets.
- The merged data enables answering complex questions about Nobel laureates, their mentors, affiliations, and more.
- DSPy's approach eliminates the need for manual prompt writing, focusing instead on declaring intent through programming.
- The methodology is cost-effective and scalable, with potential applications in various domains beyond the presented use case.