Newton's Method
The quadratic approximation method for finding a minimum of a function of
one variable generated a sequence of second degree Lagrange polynomials, and
used them to approximate where the minimum is located. It was implicitly
assumed that near the minimum, the shape of the quadratics approximated the
shape of the objective function . The
resulting sequence of minimums of the quadratics produced a sequence converging
to the minimum of the objective function . Newton's
search method extends this process to functions of n independent variables: . Starting
at an initial point , a
sequence of second-degree polynomials in n variables can be constructed
recursively. If the objective function is well-behaved and the initial point is
near the actual minimum, then the sequence of minimums of the quadratics will
converge to the minimum of the objective function. The process will use both
the first- and second-order partial derivatives of the objective
function. Recall that the gradient method used only the first partial
derivatives. It is to be expected that Newton's method will be more efficient
than the gradient method.
Background
Now we turn to the minimization of a function
of n variables, where and
the partial derivatives of
are accessible. Although the Newton search method will turn out to have a
familiar form. For illustration purposes we emphasize the two dimensional case
when
. The
extension to n dimensions is discussed in the hyperlink.
Definition (Gradient). Assume
that is
a function of two variables, , and
has partial derivatives and . The
gradient
of , denoted
by , is
the vector function
.
Definition (Jacobian
Matrix). Assume
that
are functions of two variables, ,
their Jacobian matrix
is
.
Definition (Hessian
Matrix). Assume
that is
a function of two variables, , and
has partial derivatives up to the order two. The
Hessian matrix is
defined as follows:
.
Lemma 1. For the
Hessian matrix is
the Jacobian matrix for the two functions
,
i. e.
.
Lemma 2. If
the second order partial derivatives of are
continuous then the Hessian matrix is
symmetric.
Taylor Polynomial for
f(x,y)
Assume that is
a function of two variables, , and
has partial derivatives up to the order two. There are two ways to write the
quadratic approximation to f(x,y), based on series and matrices,
respectfully. Assume that the point of expansion is and
use the notation and
, then
.
The Taylor polynomial using series notation is
The Taylor polynomial using vector and matrix notation is
The latter can be written in the form
.
Using vector notations , and
it
looks like
.
The above formula is also the expansion of centered
at the point with .
Newton's Method for Finding a
Minimum
Now we turn to the minimization of a function
of n variables, where and
the partial derivatives of
are accessible. Assume that the first and second partial derivatives of exist
and are continuous in a region containing the point , and
that there is a minimum at the point . The
quadratic polynomial approximation to is
A minimum of occurs
where .
Using the notation and
and the symmetry of , we
write
The gradient is
given by the calculation
Thus the expression for can
be written as
.
If is
close to the point (where
a minimum of f occurs), then is
invertible the above equation can be solved for ,
and we have
.
This value of
can be used as the next approximation to
and
is the first step in Newton's method for finding a minimum
.
Lemma If
column vectors are preferred over row vectors, then is
given by the computation
.
Remark.
This is equivalent to a Newton-Raphson root finding problem: Given the vector
function find
the root of the equation . For
this problem the Newton-Raphson formula is known to be
,
where our previous work with Newton-Raphson method used column vectors and . Here
we use and
Lemma 1 gives the relationship
.
Outline of the Newton Method
for Finding a Minimum
Start with the approximation to
the minimum point . Set .
(i) Evaluate
the gradient vector and
Hessian matrix
(ii) Compute
the next point .
(iii) Perform
the termination test for minimization. Set .
Repeat the process.
In equation (ii)
the inverse of the Hessian matrix is used to solve for . It
would be better to solve the system of linear equations represented by equation
(ii)
with a linear system solver rather than a matrix inversion. The reader should
realize that the inverse is primarily a theoretical tool and the computation and
use of inverses is inherently inefficient.
Algorithm (Newton's Method
for Finding a Minimum). To numerically
approximate a local minimum of , where f is
a continuous function of n real variables and by
starting with one point and
using the Newton method search for a minimum.
Program (Newton's Method for
Finding a Minimum). To numerically
approximate a local minimum of , where f is
a continuous function of n real variables and by
starting with one point and
using the Newton method search for a minimum. For illustration purposes we
propose the following subroutine for variables.
|