TabFM: A zero-shot foundation model for tabular data
2 days ago
- #Tabular Data
- #Zero-Shot Learning
- #Foundation Model
- Google introduces TabFM, a foundation model for zero-shot classification and regression on tabular data, eliminating manual training and feature engineering.
- TabFM uses in-context learning (ICL) to process entire datasets as prompts, learning relationships at inference time without weight updates.
- The model employs a hybrid architecture combining strengths from TabPFN and TabICL to handle tabular data's two-dimensional, orderless structure.
- Due to scarcity of real-world tabular datasets, TabFM is pre-trained on hundreds of millions of synthetic datasets generated with structural causal models.
- Benchmarks on TabArena show TabFM outperforms traditional supervised methods like XGBoost and specialized models, with two configurations (1B and 3.1B parameters).
- TabFM will be integrated into Google BigQuery, allowing users to make predictions via a simple SQL command without ML expertise.