Cell Mates: Extracting Useful Information from Tables for LLMs
a year ago
- #LLMs
- #Data Processing
- #Tabular Data
- LLMs currently lack the ability to effectively encode knowledge from tabular data like survey data beyond published statistical summaries.
- The main challenge is finding a useful representation for tabular data; representing each row as a sentence misses much of the table's knowledge.
- Mechanical distillation techniques are proposed, including creating univariate, bivariate, and multivariate summaries based on table structure.
- The approach involves understanding the data's collection and structure, learning what questions can be asked, and creating mechanical summaries and plots.
- This pipeline could be used for RAGs and supplementing 'world data', with scientific data repositories like Harvard Dataverse suggested as starting points.