Hasty Briefsbeta

The First LLM

a day ago
  • #LLM
  • #transformer
  • #AI-history
  • The article reflects on the birth and evolution of Large Language Models (LLMs), questioning who created the first LLM and defining what constitutes an LLM.
  • In 2016, the author was in computer science classes discussing Chomsky’s hierarchy and Turing’s test, unaware of the impending LLM revolution.
  • By 2018, Jeremy Howard published ULMFit, considered by some as the first LLM, followed by Alec Radford’s GPT-1, which popularized the transformer architecture.
  • The author defines an LLM as a self-supervised, next-word-predicting language model that can be easily adapted to various text-based tasks without architectural changes.
  • Early models like CoVE and ELMo used supervised learning and task-specific architectures, making them distinct from LLMs as defined.
  • ULMFit, an LSTM-based model, is argued to be the first true LLM due to its self-supervised training and adaptability, though GPT-1’s transformer architecture scaled better.
  • The article highlights the multi-polar origins of LLMs, emphasizing contributions beyond OpenAI, including Australian and potential future Chinese advancements.
  • The term 'LLM' may persist even as models evolve into multimodal systems, similar to how 'GPU' remains despite broader functionalities.
  • The first LLM was likely ULMFit, pre-trained on Wikipedia and fine-tuned on IMDb reviews, with GPT-1’s transformer paving the way for future scaling.
  • The author concludes by anticipating many more LLM advancements and urging continued attention to the field’s rapid developments.