Mo logo [home] [lexicon] [problems] [tests] [courses] [auxiliaries] [notes] [staff] german flag

Mathematics-Online lexicon:

Steepest Descent

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z overview

The method of steepest descent is used to minimize multivariate functions $ f(x_1,\ldots,x_n)$. In one step $ x\to y$ of the iteration, first, the negative gradient

$\displaystyle d = -\operatorname{grad}f(x)

is computed, which is the best local search direction. Then, the next approximation $ y$ is determined by minimizing $ f$ in the direction $ d$:

$\displaystyle f(y) = \min_{t\ge 0} f(x+td)


As shown in the figure, the search direction is orthogonal to the level set through $ x$ and touches the level set for a smaller function value in $ y$.

The convergence of the sequence $ x_0,x_1,\ldots$, generated by the the method of steepest descent, can be shown under fairly general assumptions. It is sufficient that $ f$ is bounded from below and $ \operatorname{grad}f$ is Lipschitz continuous in a neighborhood $ U$ of the set $ \{x: f(x) \leq f(x_0)\}$, i.e.,

$\displaystyle \Vert
\operatorname{grad}f(x) -
\operatorname{grad}f(\tilde x)
\Vert \le L \Vert x-\tilde x\Vert,\quad
x,\tilde x\in U

These hypotheses imply that

$\displaystyle \sum_{\ell=0}^\infty
\Vert\operatorname{grad}f(x_\ell)\Vert^2 < \infty

In particular, $ \Vert\operatorname{grad}f(x_\ell)\Vert$ tends to zero, and it follows that each accumulation point of the sequence $ x_0,x_1,\ldots$ is a critical point of $ f$. While, statistically, convergence to a local minimum is almost certain, exceptional cases cannot be excluded.

To ensure convergence of the algorithm, it is not necessary to find the exact one-dimensional minimium in the direction of $ -\operatorname{grad}f(x)$. It is sufficient to use a non-optimal decent direction $ d$ and obtain merely a reduction of the current function $ f(x)$ value which is proportional to $ \Vert\operatorname{grad}f(x)\Vert^2$.

[Downloads] [Examples] [Links]

  automatically generated 6/14/2016