Math 157讲解、辅导Mathematical Software、辅导R编程、R留学生讲解

- 首页 >> Algorithm 算法

A12-08-18

December 4, 2018

0.1 Math 157: Intro to Mathematical Software

0.2 UC San Diego, fall 2018

0.3 Final project, part 1: due December 8, 2018

The final project consists of two parts, each of equal value. Both parts will be submitted in this

folder, just like a homework assignment. However, the final project is not equivalent to a homework

assignment from the point of view of course grading: it is a separate contribution, and

cannot be dropped. The final project will be collected at 8 pm on Saturday, December 8th.

Part 1 is a problem set in the style of the preceding homework assignments, but with material

covering the whole course. It consists of 6 problems, each of equal value. As usual, please enter

all answers within this notebook except as specified, and cite all courses and collaborators.

Throughout this problem set, use the SageMath (stable) kernel unless otherwise specified.

0.3.1 Problem 1: Determinants

Grading criteria: correctness of code and explanations.

1a. Implement a function called my_det that calculates the determinant of a matrix recursively

using expansion by cofactors (i.e., the way you learned to compute them by hand in Math

18/Math 20F. This is the first method described here). The Matrix methods delete_columns and

delete_rows may come in handy. You can assume the input is a square matrix.

In [0]: def my_det(matrix):

1b. Check that your function agrees with Sage’s built-in function on a random 5 by 5 matrix of

integers.

In [0]:

1c. Do some timings for random n × n integer matrices for n = 5, . . . , 10 comparing your

function and Sage’s built-in function. Make a single plot containing both sets of results.

In [0]:

1d. Explain the results of 1c in terms of the asymptotic complexity of the two methods.

In [0]:

1

0.3.2 Problem 2: Graphs and number theory

Grading criteria: correctness of code.

2a. Read the definition of a Paley graph, then write a function that, given a prime p congruent

to 1 modulo 4, constructs the Paley graph associated to that prime. (There also exist Paley graphs

associated to prime powers, but your function need not construct those.) Note: Sage has a built-in

function to do this: graphs.PaleyGraph(p). Your function should not use this one, but it gives a

good way to check your work. (Hint: You may want to reread the section on quadratic residues

in the lecture on 11-07-18.)

In [0]:

2b. Using your answer to 2a for p = 13 and the Paley construction, construct (but do not print)

a 28 × 28 matrix which achieves the Hadamard determinant bound.

In [0]:

2c. Check that your answer to 2b actually does achieve the Hadamard bound.

In [0]:

0.3.3 Problem 3: Distances between cities

Grading criteria: correctness of code.

3a. Retrieve the CSV version of the Zip Code Database, upload it into your project, and import

it into a DataFrame.

In [0]:

3b. Import the Excel file "state_capitals.xlsx" (sourced from Wikipedia, found in this directory)

into a DataFrame.

In [0]:

3c. Construct a new DataFrame in which each row consists of: - the name of a state, - the

two letter postal abbreviation of the state, - the name of the capital city, - a list of the zip codes

associated to that city, - the latitude and longitude associated to the first zip code in the list.

Note: For some reason, there is one capital city missing from the Zip Code Data. You should

manually add this missing data to your new DataFrame. You only have to add one zip code in for

this city.

In [0]:

0.3.4 Problem 4: Minimum spanning trees

Grading criteria: correctness of code.

William Stein announces the formation of CoCalc Fiber, a new broadband network which will

connect the capital cities of the 48 continental United States (excluding Alaska and Hawaii) with a

series of underground cables joining certain pairs of cities. Naturally, William wants to minimize

the cost of building this network; he assumes that the cost of building a network link between

some pairs of cities is proportional to the distance between those two cities.

2

4a. Define a function that implements the haversine formula: given two latitude-longitude

pairs, compute the distance between those two points on Earth. Use kilometers instead of miles,

because the US was supposed to convert to the metric system in 1975. Check your formula by

computing the distance between Boston and San Diego, as reported by the Great Circle Mapper

web site.

In [0]:

4b. Construct a complete graph on 48 vertices in which each vertex is labeled by the name of a

state, and each edge is labeled by the distance between the capital cities.

In [0]:

4c. Construct the minimal spanning tree corresponding to the optimal CoCalc Fiber network.

(Hint: You may want to reread the lecture on 10-26-18.)

In [0]:

0.3.5 Problem 5: Image compression with singular value decomposition

Run the following code to get a NumPy array corresponding to a grayscale picture of a raccoon.

In [1]: import numpy as np

from scipy import misc

import matplotlib.pyplot as plt

f = misc.face(gray=True)

plt.imshow(f, cmap=plt.cm.gray)

Out[1]: <matplotlib.image.AxesImage object at 0x7fbabcd30710>

Out[1]:

3

Run the following code to get the singular value decomposition of the matrix corresponding

to this picture.

In [2]: U, Sigma, V = np.linalg.svd(f)

print("Shape of U:" + str(U.shape))

print("Shape of Sigma:" + str(Sigma.shape))

print("Shape of V:" + str(V.shape))

Shape of U:(768, 768)

Shape of Sigma:(768,)

Shape of V:(1024, 1024)

5a. Write a Python function that takes as input a positive integer k and the arrays U, Σ, and

V. It should make a 768 × k matrix Uk

from the first k columns of U, a k × k diagonal matrix Σk

with the first k entries of Σ along the diagonal, and a k × 1024 matrix Vk

from the first k rows of V.

It should then multiply these matrices to get an approximation UkΣkVk of the original matrix and

plot the corresponding grayscale picture. Note that this product will give a 768 × 1024 matrix, so

it will be the correct size.

In [0]:

5b. Run your function for some different values of k. What (approximately) is the smallest

value for which the picture is recognizably a raccoon? What is the smallest value for which the

picture is indistinguishable from the original?

In [0]:

5c. As a function of k, what is the total combined size (number of entries) of the parts of U, Σ,

and V that are used to reconstruct the original picture? What is the compression ratio for the two

values you found in 5b? (I.e., what percent of the original data was used in the reconstruction?)

0.3.6 Problem 6: Machine learning with irises

Grading criteria: correctness and relevance of code.

6a. The Wikipedia article on the iris dataset (previously seen on homework 7) asserts:

The use of this data set in cluster analysis however is not common, since the data set

only contains two clusters with rather obvious separation.

Demonstrate this by performing a clustering computation and showing that it fails to separate

all the species in the dataset. (You can change the kernel to do this.)

In [0]:

6b. Use the scikit-learn SVM classifier to classify species. Use a random sample of 80% of

the initial data for training and the other 20% for testing, and report the accuracy rate of your

predictions.

In [0]:


站长地图