The First LLM

a day ago

https://thundergolfer.com/blog/the-first-llm

Copy Link

#LLM
#transformer
#AI-history

The article reflects on the birth and evolution of Large Language Models (LLMs), questioning who created the first LLM and defining what constitutes an LLM.
In 2016, the author was in computer science classes discussing Chomsky’s hierarchy and Turing’s test, unaware of the impending LLM revolution.
By 2018, Jeremy Howard published ULMFit, considered by some as the first LLM, followed by Alec Radford’s GPT-1, which popularized the transformer architecture.
The author defines an LLM as a self-supervised, next-word-predicting language model that can be easily adapted to various text-based tasks without architectural changes.
Early models like CoVE and ELMo used supervised learning and task-specific architectures, making them distinct from LLMs as defined.
ULMFit, an LSTM-based model, is argued to be the first true LLM due to its self-supervised training and adaptability, though GPT-1’s transformer architecture scaled better.
The article highlights the multi-polar origins of LLMs, emphasizing contributions beyond OpenAI, including Australian and potential future Chinese advancements.
The term 'LLM' may persist even as models evolve into multimodal systems, similar to how 'GPU' remains despite broader functionalities.
The first LLM was likely ULMFit, pre-trained on Wikipedia and fine-tuned on IMDb reviews, with GPT-1’s transformer paving the way for future scaling.
The author concludes by anticipating many more LLM advancements and urging continued attention to the field’s rapid developments.

Hasty Briefsbeta

The First LLM