All of human cooking compressed into 2 megabytes

a month ago

Epicure is a family of three sibling skip-gram ingredient embeddings trained from scratch on a multilingual recipe corpus.
The dataset aggregates 4.14 million recipes from 11 sources across seven languages, normalized to 1,790 canonical ingredient entries using an LLM-augmented pipeline.
Three Metapath2Vec variants are developed: Cooc (co-occurrence graph only), Chem (typed compound metapaths only), and Core (blends both with controlled mixing).
The embeddings are seeded using a 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph with 2,247 compound nodes across 15 categories.
Each model represents a distinct point on the spectrum between chemistry and recipe context.

Hasty Briefsbeta