Before 2020, building smarter AI felt like trial and error. Then Kaplan et al. at OpenAI published 'Scaling Laws for Neural Language Models' — and changed everything.
**The finding**: AI model performance follows a power-law relationship with three variables:
1. **Compute** (number of FLOPs used for training)
2. **Parameters** (number of weights in the model)
3. **Data** (amount of training text)
Scale any of these up, and performance improves — predictably, mathematically. The relationship holds across many orders of magnitude.
This meant that if you had enough compute, you could predict in advance exactly how good a model would be before training it. Engineering replaced research as the primary driver of progress.
**The Chinchilla correction (2022):** DeepMind showed that OpenAI's original scaling laws under-emphasized data. For optimal performance at a given compute budget:
- Llama-style models: parameters and data should scale equally
- You need ~20 tokens of training data per parameter
- GPT-3 (175B params) was undertrained — should have trained longer on more data
**Implications that shocked the industry:**
- GPT-4's capabilities could be estimated before training completed
- Bigger isn't always better — efficient data/compute ratio matters
- Pre-training compute is the primary moat; data quality matters as much as quantity
**The wall**: Some researchers argue we're approaching data limits — the internet has been largely consumed. Synthetic data (AI-generated training data) is the next frontier.
**Key takeaway:** AI performance scales predictably with compute, data, and parameters. This single insight turned AI research into an engineering problem.