Math 160讲解、辅导MATLAB程序设计、讲解MATLAB、CANVAS留学生辅导

- 首页 >> Web
Math 160 (De Loera) Final Project
Due date: June 13th
INSTRUCTIONS
This is the final project of the course (35 points total). No collaboration is
allowed. You are supposed to developed everything on your own. You can ask
me in class or in office hours. If you use a source outside the notes of class you
have to cite it properly.
Solutions need to be submitted via CANVAS using the uploading online capabilities.
Submit files that contains all code (with comments), and all pdf latex docs.
Do not forget to submit the data you used (specially if you reformatted things)!
Make sure we can run the code from a simple call. E.g., all data files should be
included. We will not download files for you or retype or copy paste your code.
If it does not run straight out of the box, you will receive zero points!!. Be organized
and use the notation appropriately. Show your work on every problem.
Correct answers with no support work will not receive full credit.
1. How to influence the opinion of people when on a budget. (11 points)
Apparently it is in fashion to use social networks, like facebook, to influence voters in elections and
other awful things (for example https://www.nytimes.com/2019/05/17/technology/facebook-ai-schroepfer.
html?action=click&module=Editors%20Picks&pgtype=Homepage)
In this problem, you will consider a simplified model of influence in social networks: the social
network is represented by an undirected graph G = (V, E) and for a set of nodes S V , we denote
by N(S) the set of their neighbors:
N(S) = {v ∈ V | u ∈ S, (u, v) ∈ E}.
The influence I(S) of a set of nodes is measured by I(S) = |N(S)|.
Imagine you work now for facebook and you are given the dataset available at http://www.math.
ucdavis.edu/~deloera/TEACHING/MATH160/facebookgraph.txt. This dataset is in fact a subgraph
of the real Facebook social graph. Each line in the file contains the id of two users, indicating
that these two users are friends on Facebook.
a. As the designer of a marketing campaign to influence the opinion of voters, your goal is to find
a subset S V of at most K nodes whose influence is maximal. Write a mathematical model
to solve this problem. Can you solve the problem for the data you were given using SCIP? If
not, can you write a practical method to give an approximation to the optimum? Extra bonus
if you can prove a guarantee of approximation.
b. Using any of your models/methods from part (a) write a computer function which, given the
social network described in the dataset and a budget K ∈ N
+, returns an approximately
optimal set of nodes S for the influence function I(S). The function should return both the
users to influence and the value (total amount of influence) obtained.c. Plot the influence I(S) obtained by your function as a function of the budget K. E.g., what
happens when K = 1 and K = |V |?
d. In the definition of I(S) a node with largest degree is one of the most influential persons, i.e.,
he/she is a VIP. Can you think of a way to adapt the pagerank algorithm to identify VIP
nodes in the graph? Can you invent of a third additional way to define who are the VIPs in
this graph? What else could you use to measure influence?
e. A vertex-cover of a graph G is a set S of vertices of G such that each edge of G is incident with
at least one vertex of S. The vertex cover number τ (G) is the size of a minimum vertex cover
in G. A dominating set for a graph is a subset D of vertices such that every vertex not in D is
adjacent to at least one member of D. The domination number γ(G) is the smallest dominating
set. Are there any relationships among the influence function I(S), τ (G), and γ(G)? Explain.
Formulate a discrete models that given a graph find a vertex cover with smallest number of
vertices and a smallest dominating set. Explain the reasoning on your variables and constraints.
f. Show that if M is a matching and S is some vertex cover |S| ≥ |M|. In particular show that
max{|M| : M is a matching of G} ≤ min{|S| : S is a vertex-cover of G}.
HINT: How many vertices do you need to cover just the edges of M?
2. Clustering and pairing senators by their political ideas (12 points)
The zip-file https://www.math.ucdavis.edu/~deloera/TEACHING/MATH160/2-congress-datafiles.
zip contains two csv files with (real!) voting data. One file, I call F114, has information of how all
USA senators (in the 114th congress) voted in 15 different bills. 1 means voted YES, 0 means voted
NO, 0.5 means ABSTAINED. The other file, I call FYemen, contains votes for the 116 congress
on single issue (the legality of the US involvement on the ongoing war in Yemen). This will be used
later.
We always assume senators of the same party vote similarly, but let us analyze the votes using
mathematics and uncover some information!
Clustering is an operation in data science where we divide data up into a fixed number of clusters
while trying to ensure that the items in each cluster are as similar as possible (e.g., if we cluster by
political party we presumably get similar voting records, similar opinions). K-means is a clustering
algorithm that allows you to cluster into K clusters based on distances. It is implemented in
MATLAB already.
a. Use K-means on F114, with K = 2 to get a clustering in two groups. Obviously, do not use
the labels of the real affiliations I gave you! Choose one to be called democrats, the other to
be the republicans (do not look at the original file). Make your best mathematical guess of the
party affiliation.
You have obtained a classification of senators into two classes. Does your classification coincide
exactly with the real party affiliation? Or are there surprises? To measure this compute the
false-positive rate and false-negative rate of your classification:
false-positive rate := number of senators incorrectly labeled as democrats
number of democratic senators
false-negative rate := number of senators incorrectly labeled as republicans
number of republican senatorsb. What if you cluster with K = 3? That would after all mean there are three factions in the
senate (moderates, conservatives, liberals?).
c. Next write MATLAB code to do clustering using the Eigenvalue Method I explained in class.
d. Clustering can also be model as an integer optimization problem. Write such a model and code
it in SCIP. Run it in the senator’s data.
3. More matching problems from social sciences (12 points).
E.g., how do people find a spouse? How do students get matched to medical schools? How do
workers get assigned to jobs? How do organ donors get match to patients in need? We saw in class
one can use matchings in graphs. Here we discuss this topic more and explore a new model for
agreement between pairs of people. We will use it for the senator’s data.
We say matching M is said to be maximal if there is no matching M0 with M ? M0
. A maximal
matching of the largest cardinality is a maximum matching. The size of a maximum matching is
denoted by OPTG. In the following we will explore several approaches to find a maximum matching
in a graph:
a. Connecting with the previous problem. Let M be a matching and let us denote by V (M) the
set of endpoints of edges in M:
V (M) := {u |e ∈ M, u ∈ e}
Show that when M is a maximal matching of G, V (M) is a vertex cover of G.vLet M be a
maximal matching of G, show that |V (M)| ≤ 2OPTG.
b. Next consider the greedy algorithm for finding a maximum matching: Start with an arbitrary
edge as the very first matching. Find another edge that does not have a vertex in common
with the current matching. If one exists add it to the current matching. Repeat until no more
edges can be added. What is the running time of the greedy algorithm on a graph with n
nodes and m edges?
Give an example where the greedy algorithm fails to find a maximum matching. But show
that the greedy method always yields a solution that has at least half as many edges as a true
maximum matching.
c. Propose an integer-optimization model, one you will implement in SCIP (i.e., now based on
linear equations, inequalities), to construct a maximum matching of any graph G. What if now
each edge i, j has a weight, i.e., each edge in a matching is not counted 1 time, but with wi,j
value. Describe algorithms used to find the maximum weight matching (we saw one in class).
d. Imagine now: Use the data of US senators to construct a graph. The nodes represent US
senators, edges appear with weights where they agree or disagree on 15 possible topics. Thus
weights are now integer numbers between -15 and 15. The weights are calculated by adding -1
or 1 when the person i and person j disagree or agree on a topic. For example if they agree in
only 4 of the topics the weight of arc is 7. Calculate the maximum weight matching on the
senator’s graph.
e. Suppose now you are planning for a debate tournament of USA senators. A debate is only
interesting if the pair of opponents disagree in most issues (negative weights of edges) What is
the largest number of debate matches you can set up? Do you think Hall’s marriage theorem
we saw in class relates to this e. In the max-cut problem One wants to find a subset S of the nodes such that the number of
edges between S and the complementary subset is as large as possible. What is the max-cut of
the senator’s graph (no weights!)? Now, in the weighted Max-Cut each edge has a real number,
its weight, and the objective is to maximize the total weight of the edges between S and its
complement. What is the max cut now using the senator’s graph with weights?
f. BONUS 5 extra points: This is a harder open-ended challenge. Do NOT attempt this
problem unless you have finished the regular problems. Attempt at your own risk!!
Using both senator data files, in particular FYemen, to do the following: create a new graph
for the 116 congress. The graph keeps those senators that were members in both sessions of
congress and adds new names (e.g., Kamala Harris was not a senator in 114 congress) replacing
those (e.g., John McCain) who left the senate. So, the new graph has still the same number
of nodes and arcs, but we do not know the disagreement weights of some of the senators (e.g.
Kamala Harris vs Ted Cruz)!!
Your final task is to create a mathematical model to predict the most likely disagreement
weights (i.e., missing weights on new graph). Try to give good reasons for your method and
support it with mathematical arguments. Calculate the missing weights and use the data
available to you to make a convincing case you are right.

站长地图