Merriam-Webster and Unstructured Data Processing

9 days ago

Copy Link

Merriam-Webster's dictionary creation process involves collecting and curating unstructured data through 'reading and marking' by editors.
Editors structure the data by defining or revising words manually, a labor-intensive but high-value step.
Ancillary features like etymology and pronunciations add further value to the dictionary.
Successful data projects follow a pattern: collect unstructured data, structure it, and offer subsidiary datasets.
Examples include Google Search and cryptic crossword datasets, which follow a similar process.

Hasty Briefsbeta