Behind every trained model is gradient descent — an algorithm that nudges model weights in the direction that reduces error. Small steps, millions of times.

Gradient descent is the core optimization algorithm in ML.

**The intuition:** Imagine standing on a hilly landscape in the fog. You want to reach the lowest valley. You can only feel the slope under your feet. Gradient descent says: always step in the direction that goes downhill.

**In ML terms:**

1. Make a prediction with current weights

2. Calculate the error (loss)

3. Compute the gradient (direction of steepest increase in error)

4. Update weights in the *opposite* direction by a small amount (learning rate)

5. Repeat millions of times

**Variants:** SGD (stochastic), Mini-batch, Adam (adaptive learning rate — used most in practice)

Gradient Descent: How Models Actually Learn