Hasty Briefsbeta

Vintage Large Language Models

6 days ago
  • #vintage-LLMs
  • #AI-forecasting
  • #historical-data
  • Vintage large language models (LLMs) are trained on past data up to a specific date, such as 2019, 1900, or even 200 AD.
  • Challenges include ensuring sufficient training data and preventing contamination from future information.
  • Multimodal data like images can be included if they represent things people could see or experience at the time.
  • Applications include testing LLMs for forecasting and scientific invention by backtesting predictions or reinventing past ideas.
  • Humanistic motivations include simulating conversations with historical figures and exploring counterfactual intellectual histories.
  • Epistemic AI uses LLMs to improve accuracy in beliefs and models, requiring gold standard examples for training.
  • Data requirements are significant, needing vast historical datasets without future leakage, and training costs can be high.
  • Synthetic data techniques can help bridge gaps in historical data by generating variations of existing documents.
  • Chronological training with forking can reduce costs by training models up to a certain date and then branching.
  • Additional ideas include outsourcing functions to current LLMs and compartmentalizing models by date annotations.