Merriam-Webster and Unstructured Data Processing
9 days ago
- #data processing
- #dictionary
- #unstructured data
- Merriam-Webster's dictionary creation process involves collecting and curating unstructured data through 'reading and marking' by editors.
- Editors structure the data by defining or revising words manually, a labor-intensive but high-value step.
- Ancillary features like etymology and pronunciations add further value to the dictionary.
- Successful data projects follow a pattern: collect unstructured data, structure it, and offer subsidiary datasets.
- Examples include Google Search and cryptic crossword datasets, which follow a similar process.