WeeBytes
Start for free
What is the History of Large Language Models?
BeginnerAI & MLLarge Language ModelsKnowledge

What is the History of Large Language Models?

Large language models didn't appear overnight. They emerged from decades of NLP research, with breakthroughs in 2017 (transformers), 2018 (BERT/GPT), 2020 (GPT-3), and 2022 (ChatGPT) each pushing capabilities dramatically forward. Understanding this arc helps make sense of where LLMs are heading next.

The path to modern LLMs runs through decades of gradual progress punctuated by a few decisive breakthroughs. 2013: Word2Vec introduced dense word embeddings, showing that neural networks could capture semantic relationships. 2014: sequence-to-sequence models enabled neural machine translation. 2017: Google's 'Attention Is All You Need' paper introduced the transformer architecture, replacing recurrent networks with attention mechanisms that parallelized training and scaled far better. 2018: BERT (Google) and GPT-1 (OpenAI) pioneered the pretrain-then-fine-tune paradigm. 2019: GPT-2 demonstrated that scaling up transformers produced qualitatively new capabilities, but OpenAI initially withheld the full model citing misuse concerns. 2020: GPT-3 shocked the field by showing that few-shot learning emerged naturally at sufficient scale — the model could perform new tasks just from examples in its prompt. 2022: ChatGPT launched and crossed 100 million users in two months, introducing the public to LLMs. RLHF (reinforcement learning from human feedback) made models dramatically more helpful. 2023: GPT-4 and Claude demonstrated expert-level performance across professional exams and complex reasoning. 2024–2026: reasoning models (o1, o3, Claude 4, Gemini 2.5) marked a shift from purely scaling parameters to scaling inference-time thinking. Agentic capabilities, multimodality, and long-context processing moved into production. Each phase built on the previous one — and each one was a surprise to most researchers when it happened.

history-of-large-language-modelsai-historytransformer-era

Want more like this?

WeeBytes delivers 25 cards like this every day — personalised to your interests.

Start learning for free