Gradient descent is the core optimization algorithm in ML.
**The intuition:** Imagine standing on a hilly landscape in the fog. You want to reach the lowest valley. You can only feel the slope under your feet. Gradient descent says: always step in the direction that goes downhill.
**In ML terms:**
1. Make a prediction with current weights
2. Calculate the error (loss)
3. Compute the gradient (direction of steepest increase in error)
4. Update weights in the *opposite* direction by a small amount (learning rate)
5. Repeat millions of times
**Variants:** SGD (stochastic), Mini-batch, Adam (adaptive learning rate — used most in practice)