The First LLM
a day ago
- #LLM
- #transformer
- #AI-history
- The article reflects on the birth and evolution of Large Language Models (LLMs), questioning who created the first LLM and defining what constitutes an LLM.
- In 2016, the author was in computer science classes discussing Chomsky’s hierarchy and Turing’s test, unaware of the impending LLM revolution.
- By 2018, Jeremy Howard published ULMFit, considered by some as the first LLM, followed by Alec Radford’s GPT-1, which popularized the transformer architecture.
- The author defines an LLM as a self-supervised, next-word-predicting language model that can be easily adapted to various text-based tasks without architectural changes.
- Early models like CoVE and ELMo used supervised learning and task-specific architectures, making them distinct from LLMs as defined.
- ULMFit, an LSTM-based model, is argued to be the first true LLM due to its self-supervised training and adaptability, though GPT-1’s transformer architecture scaled better.
- The article highlights the multi-polar origins of LLMs, emphasizing contributions beyond OpenAI, including Australian and potential future Chinese advancements.
- The term 'LLM' may persist even as models evolve into multimodal systems, similar to how 'GPU' remains despite broader functionalities.
- The first LLM was likely ULMFit, pre-trained on Wikipedia and fine-tuned on IMDb reviews, with GPT-1’s transformer paving the way for future scaling.
- The author concludes by anticipating many more LLM advancements and urging continued attention to the field’s rapid developments.