Workflow辅导、Python编程程序讲解、解析Python、讲解Python语言、Python设计解析留学生

2018.10.01 - 首页 >> Python编程

9/29/2018 assignment05

Homework 5

Homework Submission Workflow

When you submit your work, follow the instructions on the submission workflow page

(https://www.cs.duke.edu/courses/fall18/compsci371d/homework/workflow.html) scrupulously for full credit.

Important: Failure to do any of the following will result in lost points:

Submit one PDF file and one notebook per group

Enter all the group members in your PDF Gradescope submission, not just in your documents. Look

at this Piazza Post (https://piazza.com/class/jl2rbsllc6c25v?cid=45) if you don't know how to do this

Match each answer (not question!) with the appropriate page in Gradescope

Avoid large blank spaces in your PDF file

Part 1: Numerical Differentiation

Gradient and Jacobian

The gradient of a function is typically viewed as a column vector in the literature:

A more natural (and not uncommon) view of the gradient is as a row vector, because then the gradient is merely

a special case (for ) of the Jacobian matrix of a function , defined as the following

matrix:

The Jacobian matrix can therefore be viewed as the column vector of gradients of the components of

the function , one gradient per row.

The distinction between gradient as a row vector and gradient as a column vector is relevant in mathematics,

but is of course moot if vectors are represented as numpy arrays in Python, because these arrays do not

distinguish between row vectors and column vectors.

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 2/10

Problem 1.1

The transformation from polar coordinates to cartesian coordinates on the plane can be written

as follows:

Write a formula for the Jacobian of this transformation.

Problem 1.2

Write functions with headers

def P2C(z):

and def JacobianP2C(z):

that compute the transformation above and its Jacobian matrix. Inputs and outputs should be numpy arrays.

Show your code. You will test it in a later problem.

Problem 1.3

Numerical Approximation of the Derivative

In homework 4, we explored the concept of gradient and Hessian in a case in which the function

was known analytically. This is the situation students are most familiar with, because that's

what calculus courses emphasize.

In practice, however, is often known either (i) through a piece of code in some programming language, or, even

more opaquely, (ii) as a black box: A program that can be called at will with an input and produces some

output . We will explore options for scenario (i) in a later assignment. In this part of this assignment, we take

the black box route: We know nothing about except that it is differentiable (or else we cannot differentiate it!).

Of course, this situation is general and subsumes scenario (i) by just forgoing any analysis of the code. However,

we will see later that having access to the code opens other, interesting avenues.

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 3/10

By definition of (partial) derivative, if , we can write

where is the -th column of the identity matrix. That is, is a vector of zeros except for a one in

position .

We can approximate the limit above by picking a small value of and disposing of the limit:

For reasons that would take too long to get into, a much better numerical approximation is provided by the socalled

central difference

There is a whole literature on how to choose appropriately: Too large, and we are too far from the limit. Too

small, and numerical errors in the computation prevail. We take a pragmatic approach in this introductory

exploration and use most of the time. More on this aspect later.

Write a Python function with header

def Jacobian(f, z, delta=1e-5):

that takes a function f from to , a numpy vector z with entries, and an optional value for and returns

a numpy array with the Jacobian of f at z, using the central difference formula given above.

Show your code. You will test it in the next problem.

Programming Notes

Your function Jacobian must work for any function f from to for any and . The function f

takes a numpy vector with entries as input and produces a numpy vector with entries. The function

Jacobian outputs a numpy array of shape (not !).

For later use, it is important that Jacobian return a numpy array in all cases (even when the result is a scalar).

Problem 1.4

Show the result of running the tests below. This will happen automatically once your functions Jacobian, P2C,

and JacobianP2C, are defined correctly (and you run the cell below).

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 4/10

These tests use the default value for .

In?[1]: def compare(a, f, b, delta=1e-5):

def a2s(a):

def n2s(x):

return '{:g}'.format(round(x, 4))

try:

return '[' + '; '.join([', '.join([n2s(y) for y in row]) for

row in a]) + ']'

except TypeError:

try:

return '[' + ', '.join([n2s(y) for y in a]) + ']'

except TypeError:

return '[]' if a.size == 0 else n2s(a)

aName, fName, bName = a.__name__, f.__name__, b.__name__

msgBase = '{:s}({:s}, {{:s}}) = {{:s}}\n{:s}({{:s}}) = {{:s}}'

msg = msgBase.format(aName, fName, bName)

zs = np.array([[0, 0], [1, 0], [2, 1], [2, 2]])

for z in zs:

print(msg.format(a2s(z), a2s(a(f, z, delta)), a2s(z), a2s(b(z

))), end='\n\n')

try:

compare(Jacobian, P2C, JacobianP2C)

except NameError:

pass

Problem 1.5

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 5/10

The Hessian is the Jacobian of the gradient, in the following sense. If the gradient of is a row vector,

the Hessian

can be written as follows:

Use the fact that the Hessian is the Jacobian of the gradient to write a Python function with header

def Hessian(f, x, delta=1e-5):

that uses your gradient function to compute the Hessian of f at x. Show your code.

Programming Notes

Your function must work for a function for any , not just for . However, the codomain of

has dimension now.

You will get full credit if your code is correct and uses Jacobian to fullest advantage.

Make sure that the value of delta is propagated to all calls to Jacobian.

Problem 1.6

Show the result of running the tests below. This will happen automatically once your function Hessian is

defined correctly (and you run the cell below).

The tests use a function f which is the same as in homework 4, and is given below. They also use a function

gradientF, given below, which computes the gradient of f through the analytic formula (as you did in

homework 4). These tests use the default value for .

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 6/10

In?[2]: shift, scale = [2, 1], 10

def f(z):

d = z - shift

return np.array(np.inner(z, z) / scale + np.exp(-np.inner(d, d)))

def gradientF(z):

d = z - shift

return 2 * (z / scale - d * np.exp(-np.inner(d, d)))

def HessianF(z):

I = np.eye(2)

d = z - shift

return 2 * (I / scale + (2 * np.outer(d, d) - I) * np.exp(-np.inner(d, d)))

try:

compare(Jacobian, f, gradientF)

compare(Hessian, f, HessianF)

except NameError:

pass

Problem 1.7

The default value for happens to work for the examples above. This is because all functions involved are

"tame," in the sense that their values have comparable magnitudes. Even then, however, the choice of is not

inconsequential.

Write one clear and concise sentence to describe which results are good and which are not in the tests below.

Remark

There are of course better methods for choosing than just trying some value. In particular, the choice needs to adapt to the range of values encountered during the computations.

However, the experiment in this problem should at least serve as a warning that numerical differentiation can be tricky to get right. A later assignment will explore a different technique for computing derivatives, called

algorithmic differentiation or automatic differentiation. Variants of this techniques are used in many packages

that implement deep learning methods. Stay tuned.

In?[3]: try:

delta = 1e-9

compare(Jacobian, f, gradientF, delta)

compare(Hessian, f, HessianF, delta)

except NameError:

pass

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 7/10

Part 2: Steepest Descent

In the steepest descent algorithm for the minimization of a function , the new value of is

found from the current value by a technique called line search.

Specifically, given the search direction

line search defines the function as

and finds a local minimum of for . The line search function returns the corresponding point .

The class notes show an iterative bracketing technique to perform line search. A first stage finds an inital

bracketing triple , and a second stage shrinks the triple.

In this part, you will implement and test the steepest descent algorithm. This implies that you are not allowed to

use any existing function that implements steepest descent, either exactly or substantively.

However, you are allowed to use the function scipy.optimize.minimize_scalar, which implements the

shrinkage part of line search, as explained below.

Problem 2.1

Using the imports and definition in the cell below, write a function with header

def lineSearch(f, g, z0):

that performs line search on the function f, whose gradient is computed by the function g, starting at point z0.

If the starting point is in , then functions and have the following signatures:

Show your code, and the result of running the function with the function and value z0 defined below. Defining

the corresponding gradient is your task.

9/29/2018 assignment05

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 8/10

In?[4]: from scipy import optimize as opt

import numpy as np

import math

import matplotlib.pyplot as plt

small = math.sqrt(np.finfo(float).eps)

f, z0 = lambda z: -np.sin(z), 0

Programming Notes

If the magnitude of the gradient of at is smaller than small (defined above), then lineSearch should just

return z0 without further work. Otherwise, lineSearch should return the new value of it found (not ).

The class notes state that the third element of the initial bracketing triple can be found by starting at some

small value and then increasing until is greater than evaluated at the previous value of . In doing so,

start at small and increase every time by a multiplicative factor 1.2:

c *= 1.2

A factor of 2 would work as well, but 1.2 is a bit safer.

Allow for a maximum value for , and report failure if that value is exceeded. This should not

happen in this assignment.

It will be convenient to use a two-item tuple for the initialization of the keyword parameter bracket for the

function scipy.optimize.minimize_scalar. If bracket is the pair , then the function will call

another function (scipy.optimize.bracket) that finds a bracketing triple , as defined in the class

notes. So your function lineSearch first finds (hint: ), and then calls

scipy.optimize.minimize_scalar to compute the result of line search. Read the scipy manual to

understand how to use the result from scipy.optimize.minimize_scalar.

Problem 2.2

Write a function with header

def steepest(f, g, z0, maxK=10000, delta=small, remember=False):

that uses your lineSearch function to implement steepest descent with line search.

Show your code and the value of that results from minimizing the provided function Rosenbrock with

steepest. Start the minimization at (defined below), and use the default values for the keyword

arguments.

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 9/10

Programming Notes

The gradient of the function Rosenbrock is also provided as function RosenGrad below. The function

Rosenbrock has a true minimum at (defined as zStar below).

Minimization stops when either the norm of the difference between consecutive values of differ by less than

delta or when the number of iterations equals or exceeds maxK.

If the argument remember is False, the function steepest returns the value of it found. If remember is

True, the function returns a pair (z, history). The first item of the pair is still , and the second item is a

numpy.array that records the value traversed by every step of steepest descent, including .

Here, is the number of steps and .

One step is defined as one call to lineSearch. This implies that history is not to record the intermediate

steps taken within the call to lineSearch.

Be patient, as the search may take a while.

In?[5]: def Rosenbrock(z):

return 100 * (z[1] - z[0] ** 2) ** 2 + (1 - z[0]) ** 2

def RosenGrad(z):

return np.array([400 * (z[0] ** 3 - z[0] * z[1]) + 2 * z[0] - 2,

200 * (z[1] - z[0] ** 2)])

z0, zStar = np.array([-1.2, 1]), np.array([1, 1])

Problem 2.3

Now run steepest as follows (the try/except is there so that the notebook will still run if steepest is

undefined).

In?[6]: try:

[zHat, history] = steepest(Rosenbrock, RosenGrad, z0, maxK=1000, rem

ember=True)

except NameError:

pass

Make a contour plot of the Rosenbrock function using 10 levels drawn as thin gray lines (use colors='grey',

linewidths=1 in your call to matplotlib.pyplot.contour to this effect) for and

. To make the contour plot, sample this rectangle of values with 100 samples in each dimension.

Superimpose a plot of the path recorded in history on the contour plot. Also draw a blue dot at and a red

dot at , the true minimum.

Show code and plot.

http://db.cs.duke.edu/courses/fall18/compsci371d/homework/5/assignment05.html 10/10

Problem 2.4

Convergence slows down as is approached, and even after 1000 iterations there is still a way to go. To see

this slowdown more clearly, plot a vector distance with the Euclidean distances between each of the points in

history (obtained in the previous problem) and . Label the plot axes meaningfully.

Show code and plot.