Essay
Gradients and small steps
A tiny note on optimization—why the boring update rule still runs the world.
math · ml · intuition
Most of what we call “learning” in machine learning is still, at heart, a loop: look at a signal, nudge parameters, repeat. The update everyone learns first is gradient descent:
The step size is where intuition lives. Too large and you overshoot; too small and you burn compute without going anywhere interesting.
Why this still matters
Even fancy optimizers (momentum, Adam, Lion, …) are playing the same game: turn messy, high-dimensional curvature into something you can walk through without falling over. The human part is choosing the loss so that “downhill” actually means “better for people.”
If you remember one thing: the gradient is a local promise, not a global guarantee. Pair it with good problem framing, evaluation, and skepticism.
Thanks for reading.
— Shane