辅导960:588 Data Mining、讲解R程序语言、辅导R设计、讲解Data Mining

- 首页 >> Algorithm 算法

Fall 2018, 960:588 Data Mining Homework 2.

DUE: Friday Oct 26th at midnight

INSTRUCTIONS: Submit all files on Canvas. For the free response parts, type your solution into

an electronic document or take a picture of your solution and make sure that all hand-writing is

legible. For the code parts, submit the completed files on Canvas. Email and physical submissions

will NOT be accepted.

For code that imports other files such as csv files or R files, make sure that all the

imported files are in the same directory as the code.

NOTE: It is required that you use R in this homework.

NOTE: Your code must run without error. Points may be deducted if there is any error that

causes the code to crash.

NOTE: You may work in groups with other students but you must write down the names of

all the members of your group.

Problem 1: Optimization for Logistic Lasso with Proximal

Gradient Descent

Download the files hw2_problem1.R and complete the code by filling in all parts labeled “FILL

IN”. You may find it helpful to review Section 3.2 of the lecture notes along with Proposition 4.2.2.

After completing the code, run the code to compare the solution computed by your code against

the R package glmnet. Report the estimation error as well as the deviation of your solution from

the glmnet solution.

Problem 2: Predicting voting patterns with Kernel SVM

Download the files hw2_problem2.R, votes.csv. Our dataset votes consists of over 3000 counties

in the United States along with their socio-economic and demographic information and voting

records in the 2016 US election; each row corresponds to a single county. Our response variable

will be prefer trump, which is 0 or 1 indicating whether the percentage of people who voted for

Trump in that county is greater than that who voted for Clinton in the 2016 US presidential election.

1

Part a (code)

The function kernelSVM is an incomplete implementation of kernel SVM.

Complete the code in hw2_problem2.R by filling in all the parts with the label FILL IN. Most

of the “FILL IN”’s involve making prediction with kernel SVM.

Part b (free response)

We now use kernel SVM to predict the prefer trump variable from a variety of socio-economic and

demographic features.

Run with the code with degree set to 1, then 2, 3, 4, and 5. Report the predictive error as well

as the number of support vectors in each of the cases.

How does the predictive error change with the degree of the polynomial kernel Explain why.

How does the number of support vectors change with the degree of the polynomial kernel

Explain why.

Problem 3: Predicting Shaquille O’Neal’s free throw outcomes

Part a (code)

Download the files hw2_problem3.R, shaq.csv. Our dataset shaq consists of the 1632 free throws

that Shaquille O’Neal attempted in the regular NBA seasons from 2006 to 2011. Each row correspond

to one free throw attempt along with some other information such as:

shot made: 0/1 indicating whether the free throw was successful.

home game: 0/1 indicating whether the game is an away game or a home game.

first shot: 0/1 indicating whether the free throw is the first of the two throws.

cur score: score of Shaq’s team at the time of the free throw.

opp score: score of the opposing team at the time of the free throw.

score ratio: cur score/(opp score + 1).

cur time: number of seconds into the game at the time of the free throw.

period: the quarter in which the free throw is made.

In the part of the code labeled Part (a), perform some basic exploratory analysis following the

instructions listed. Use the R command chisq.test for the χ

2 association test.

2

Part b (code and free response)

We will predict shot made using the other features with logistic ridge regression. We will use the

package glmnet for this problem. Use the documentation

https://cran.r-project.org/web/packages/glmnet/glmnet.pdf

to complete the code by filling in all parts labeled “FILL IN”. Report the ridge and baseline errors.

Why do we perform the training/testing multiple times choosing a possibly different random

subset of the data as the test set each time?

Part c (free response)

Choose the statement below which you agree with the most. Justify your answer in a sentence or

two.

1. The features are strongly predictive of Shaquille O’Neal’s free throws.

2. The features are weakly predictive of Shaquille O’Neal’s free throws.

3. There is no evidence that the features are at all predictive of Shaquille O’Neal’s free throws.

Part d (free response)

Give a two or three sentences answer to each of the following questions.

1. Each row of the original dataset also contains the final score of the game in which the free

throw was made. Why is it that we cannot use the final score of the game as a feature in

making the prediction?

2. Why is it that, ideally, when holding out samples for the test set, we should choose a set of

games and hold out ALL samples from those games?

Problem 4: KKT and Lagrangian Duality

Let A ∈ R

K×p be a K-by-p matrix and b ∈ R

K be a vector. Consider the following constrained

optimization problem:

min

x∈Rp

s.t. Ax ≤ b.

Part a (free response)

Introduce K dual variables α = (α1, . . . , αK) and write down the Lagrangian.

3

Part b (free response)

For any fixed α, take the gradient of the Lagrangian with respect to x ∈ R

p

. Show that if every

coordinate of the gradient is 0, then we have that x = ?A>α. (Hint: ?x{kxk

2

2} = 2x and

?x{b

>x} = b).

Part c (free response)

Show that the dual optimization is

max

α∈RK

>AA>α + α

>b

s.t. αk ≥ 0 for all k = 1, . . . , K.

Part d (free response)

Use the dual optimization to argue that if K = p, A is the identity matrix, and b satisfies bi ≤ 0

for all i = 1, . . . , K, then the primal solution x

satisfies x= b.

Problem 5: Deriving the Intercept for dual SVM

In lecture notes 4, we claim that we can compute the intercept bb from the dual solution αb ∈ R

n by

finding any i

∈ {1, . . . , n} such that 0 < αbi

< C and then setting

bb = Yi X>

i wb = Yi

Xn

j=1

αbjYjX>

j Xi .

In the code, for numerical stability, we find all i

such that 0 < αbi

< C and take the average

of the value Yi X>

i

wb. In this question, you will justify this formula.

Part a (free response)

Let w, b bb, ξb be the primal solution and αb be the dual solution. Recall that additional dual variables

αe arise as intermediate quantities in the derivation of dual SVM.

Use complementary slackness, which is condition (1) in the KKT Theorem (Theorem 4.4.1),

and to show that if 0 < αbi

, then 1ξbi

Yi

(X>

i

wb + bb) = 0.

Part b (free response)

Show that if αbi

< C, then αei

> 0. Use complementary slackness again to show that it must be

then that ξbi = 0.

4

Part c (free response)

Show then that for i

∈ {1, . . . , n} such that 0 < αbi

< C, we have that 1 ? Yi (X> wb + bb) = 0,

which implies that bb = Yi X>iwb.

[OPTIONAL extra credit 2 points] Part d (proof)

Prove that there must exists at least 1 point i

∈ {1, . . . , n} such that 0 < αbi

< C.


站长地图