Faster Optimization with Counterintuitively Long Steps

A new study by Benjamin Grimmer at Johns Hopkins University has demonstrated that the classic gradient descent optimization algorithm can be provably accelerated by periodically taking very long steps. This runs contrary to standard intuition and analysis that constant small steps are optimal.

Gradient descent is a fundamental algorithm used to minimize objective functions in areas like machine learning. The typical textbook analysis shows gradient descent converges optimally using a fixed small step size at each iteration. However, Grimmer has developed new analysis techniques showing faster convergence is possible by occasionally taking much longer steps, even if they temporarily increase the objective value.

This new theory allows steps so long they may not decrease the objective alone but still accelerate overall convergence. Grimmer shows convergence rates can be systematically improved by repeatedly applying carefully selected patterns of varied step sizes, with the longest steps in a pattern potentially being hundreds of times greater than the typical size.

The key innovation enabling this analysis is considering the total progress made over multiple iterations rather than requiring descent at every step individually. This deviates from most existing convergence proofs for optimization algorithms. Grimmer generates computer-verifiable certificates proving the faster convergence of these long step patterns using semidefinite programming.

The discovery of these “long step” gradient descent variants challenges common intuitions about the need for monotone objective decreases and could pave the way for improved optimization algorithms. Potential speedups in deep learning training and other applications relying on gradient-based optimization may follow. An exciting direction for future work is identifying longer step patterns and other ways to fully capture gradient descent’s acceleration potential.

In summary, this study provides a promising first look at how periodically taking counterintuitively long steps can speed up foundational optimization algorithms. The theory techniques introduced open many possibilities for rethinking convergence guarantees and designing provably faster iterative methods.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.