Math 157留学生辅导、讲解Python编程、辅导Python、讲解Software
- 首页 >> Python编程A12-08-18
December 4, 2018
0.1 Math 157: Intro to Mathematical Software
0.2 UC San Diego, fall 2018
0.3 Final project, part 1: due December 8, 2018
The final project consists of two parts, each of equal value. Both parts will be submitted in this
folder, just like a homework assignment. However, the final project is not equivalent to a homework
assignment from the point of view of course grading: it is a separate contribution, and
cannot be dropped. The final project will be collected at 8 pm on Saturday, December 8th.
Part 1 is a problem set in the style of the preceding homework assignments, but with material
covering the whole course. It consists of 6 problems, each of equal value. As usual, please enter
all answers within this notebook except as specified, and cite all courses and collaborators.
Throughout this problem set, use the SageMath (stable) kernel unless otherwise specified.
0.3.1 Problem 1: Determinants
Grading criteria: correctness of code and explanations.
1a. Implement a function called my_det that calculates the determinant of a matrix recursively
using expansion by cofactors (i.e., the way you learned to compute them by hand in Math
18/Math 20F. This is the first method described here). The Matrix methods delete_columns and
delete_rows may come in handy. You can assume the input is a square matrix.
In [0]: def my_det(matrix):
1b. Check that your function agrees with Sage’s built-in function on a random 5 by 5 matrix of
integers.
In [0]:
1c. Do some timings for random n × n integer matrices for n = 5, . . . , 10 comparing your
function and Sage’s built-in function. Make a single plot containing both sets of results.
In [0]:
1d. Explain the results of 1c in terms of the asymptotic complexity of the two methods.
In [0]:
1
0.3.2 Problem 2: Graphs and number theory
Grading criteria: correctness of code.
2a. Read the definition of a Paley graph, then write a function that, given a prime p congruent
to 1 modulo 4, constructs the Paley graph associated to that prime. (There also exist Paley graphs
associated to prime powers, but your function need not construct those.) Note: Sage has a built-in
function to do this: graphs.PaleyGraph(p). Your function should not use this one, but it gives a
good way to check your work. (Hint: You may want to reread the section on quadratic residues
in the lecture on 11-07-18.)
In [0]:
2b. Using your answer to 2a for p = 13 and the Paley construction, construct (but do not print)
a 28 × 28 matrix which achieves the Hadamard determinant bound.
In [0]:
2c. Check that your answer to 2b actually does achieve the Hadamard bound.
In [0]:
0.3.3 Problem 3: Distances between cities
Grading criteria: correctness of code.
3a. Retrieve the CSV version of the Zip Code Database, upload it into your project, and import
it into a DataFrame.
In [0]:
3b. Import the Excel file "state_capitals.xlsx" (sourced from Wikipedia, found in this directory)
into a DataFrame.
In [0]:
3c. Construct a new DataFrame in which each row consists of: - the name of a state, - the
two letter postal abbreviation of the state, - the name of the capital city, - a list of the zip codes
associated to that city, - the latitude and longitude associated to the first zip code in the list.
Note: For some reason, there is one capital city missing from the Zip Code Data. You should
manually add this missing data to your new DataFrame. You only have to add one zip code in for
this city.
In [0]:
0.3.4 Problem 4: Minimum spanning trees
Grading criteria: correctness of code.
William Stein announces the formation of CoCalc Fiber, a new broadband network which will
connect the capital cities of the 48 continental United States (excluding Alaska and Hawaii) with a
series of underground cables joining certain pairs of cities. Naturally, William wants to minimize
the cost of building this network; he assumes that the cost of building a network link between
some pairs of cities is proportional to the distance between those two cities.
2
4a. Define a function that implements the haversine formula: given two latitude-longitude
pairs, compute the distance between those two points on Earth. Use kilometers instead of miles,
because the US was supposed to convert to the metric system in 1975. Check your formula by
computing the distance between Boston and San Diego, as reported by the Great Circle Mapper
web site.
In [0]:
4b. Construct a complete graph on 48 vertices in which each vertex is labeled by the name of a
state, and each edge is labeled by the distance between the capital cities.
In [0]:
4c. Construct the minimal spanning tree corresponding to the optimal CoCalc Fiber network.
(Hint: You may want to reread the lecture on 10-26-18.)
In [0]:
0.3.5 Problem 5: Image compression with singular value decomposition
Run the following code to get a NumPy array corresponding to a grayscale picture of a raccoon.
In [1]: import numpy as np
from scipy import misc
import matplotlib.pyplot as plt
f = misc.face(gray=True)
plt.imshow(f, cmap=plt.cm.gray)
Out[1]: <matplotlib.image.AxesImage object at 0x7fbabcd30710>
Out[1]:
3
Run the following code to get the singular value decomposition of the matrix corresponding
to this picture.
In [2]: U, Sigma, V = np.linalg.svd(f)
print("Shape of U:" + str(U.shape))
print("Shape of Sigma:" + str(Sigma.shape))
print("Shape of V:" + str(V.shape))
Shape of U:(768, 768)
Shape of Sigma:(768,)
Shape of V:(1024, 1024)
5a. Write a Python function that takes as input a positive integer k and the arrays U, Σ, and
V. It should make a 768 × k matrix Uk
from the first k columns of U, a k × k diagonal matrix Σk
with the first k entries of Σ along the diagonal, and a k × 1024 matrix Vk
from the first k rows of V.
It should then multiply these matrices to get an approximation UkΣkVk of the original matrix and
plot the corresponding grayscale picture. Note that this product will give a 768 × 1024 matrix, so
it will be the correct size.
In [0]:
5b. Run your function for some different values of k. What (approximately) is the smallest
value for which the picture is recognizably a raccoon? What is the smallest value for which the
picture is indistinguishable from the original?
In [0]:
5c. As a function of k, what is the total combined size (number of entries) of the parts of U, Σ,
and V that are used to reconstruct the original picture? What is the compression ratio for the two
values you found in 5b? (I.e., what percent of the original data was used in the reconstruction?)
0.3.6 Problem 6: Machine learning with irises
Grading criteria: correctness and relevance of code.
6a. The Wikipedia article on the iris dataset (previously seen on homework 7) asserts:
The use of this data set in cluster analysis however is not common, since the data set
only contains two clusters with rather obvious separation.
Demonstrate this by performing a clustering computation and showing that it fails to separate
all the species in the dataset. (You can change the kernel to do this.)
In [0]:
6b. Use the scikit-learn SVM classifier to classify species. Use a random sample of 80% of
the initial data for training and the other 20% for testing, and report the accuracy rate of your
predictions.
In [0]: