Hasty Briefsbeta

Bilingual

Show HN: Using DSPy to enrich a dataset of the Nobel laureate network

9 months ago
  • #data-enrichment
  • #LLM
  • #knowledge-graphs
  • Combining data from multiple sources to construct knowledge graphs presents challenges in disambiguating similar-looking entities.
  • A two-part workflow involving vector embeddings and an 'LLM-as-a-judge' is introduced for entity disambiguation.
  • DSPy, a declarative framework for building compound AI pipelines, is showcased for its ability to program LLMs without manual prompting.
  • The workflow is applied to merge datasets of Nobel laureates and their mentorship relationships with enriched data from the Nobel Prize API.
  • Vector indexing and search in Kuzu are used to find similar entities, followed by LLM-based disambiguation to merge datasets.
  • The merged data enables answering complex questions about Nobel laureates, their mentors, affiliations, and more.
  • DSPy's approach eliminates the need for manual prompt writing, focusing instead on declaring intent through programming.
  • The methodology is cost-effective and scalable, with potential applications in various domains beyond the presented use case.