In 2020, OpenAI discovered a mathematical law: AI gets predictably smarter as you scale compute, data, and parameters. This single finding set off the biggest arms race in tech history.

Before 2020, building smarter AI felt like trial and error. Then Kaplan et al. at OpenAI published 'Scaling Laws for Neural Language Models' — and changed everything.

**The finding**: AI model performance follows a power-law relationship with three variables:

1. **Compute** (number of FLOPs used for training)

2. **Parameters** (number of weights in the model)

3. **Data** (amount of training text)

Scale any of these up, and performance improves — predictably, mathematically. The relationship holds across many orders of magnitude.

This meant that if you had enough compute, you could predict in advance exactly how good a model would be before training it. Engineering replaced research as the primary driver of progress.

**The Chinchilla correction (2022):** DeepMind showed that OpenAI's original scaling laws under-emphasized data. For optimal performance at a given compute budget:

- Llama-style models: parameters and data should scale equally

- You need ~20 tokens of training data per parameter

- GPT-3 (175B params) was undertrained — should have trained longer on more data

**Implications that shocked the industry:**

- GPT-4's capabilities could be estimated before training completed

- Bigger isn't always better — efficient data/compute ratio matters

- Pre-training compute is the primary moat; data quality matters as much as quantity

**The wall**: Some researchers argue we're approaching data limits — the internet has been largely consumed. Synthetic data (AI-generated training data) is the next frontier.

**Key takeaway:** AI performance scales predictably with compute, data, and parameters. This single insight turned AI research into an engineering problem.

The Scaling Laws: More Data + More Compute = Smarter AI