← Writing

Essay

Gradients and small steps

A tiny note on optimization—why the boring update rule still runs the world.

1 min readBy Shane

math · ml · intuition

Most of what we call “learning” in machine learning is still, at heart, a loop: look at a signal, nudge parameters, repeat. The update everyone learns first is gradient descent:

wt+1=wtηL(wt)w_{t+1} = w_t - \eta \nabla L(w_t)

The step size η\eta is where intuition lives. Too large and you overshoot; too small and you burn compute without going anywhere interesting.

Why this still matters

Even fancy optimizers (momentum, Adam, Lion, …) are playing the same game: turn messy, high-dimensional curvature into something you can walk through without falling over. The human part is choosing the loss LL so that “downhill” actually means “better for people.”

If you remember one thing: the gradient is a local promise, not a global guarantee. Pair it with good problem framing, evaluation, and skepticism.

Thanks for reading.

Shane

Shane

Thanks for reading. If this sparked something, I'd love a note at you@example.com.