代做CIS5200 Machine Learning Homework 1

2023.10.01 - 首页 >> Algorithm 算法

CIS5200: Machine Learning Fall 2023
Homework 1
Release Date: September 20, 2023 Due Date: October 4, 2023
• You will submit your solution for the written part of HW1 as a single PDF file via Gradescope.
The deadline is 11:59 PM ET. Contact TAs on Ed if you face any issues uploading your
homeworks.
• Collaboration is permitted and encouraged for this homework, though each student must
understand, write, and hand in their own submission. In particular, it is acceptable for
students to discuss problems with each other; it is not acceptable for students to look at
another student’s written answers when writing their own. It is also not acceptable to publicly
post your (partial) solution on Ed, but you are encouraged to ask public questions on Ed. If
you choose to collaborate, you must indicate on each homework with whom you collaborated.
Please refer to the notes and videos posted on the website if you need to recall the material discussed
in the lectures.
1 Written Questions (46 points)
Problem 1: Margin Perceptron (15 points)
Recall the Perceptron algorithm we saw in the lecture. The Perceptron algorithm terminates
once it classifies all points correctly. It does not guarantee that the hyperplane that that it finds
has large margin (γ) despite our assumption that the true hyperplane w∗ has margin γ where
γ = mini∈{1,...,m}

In this problem, we will consider the following simple modification to the Perceptron algorithm:
Algorithm 1: Margin Perceptron
Initialize w1 = 0 ∈ R
d
for t = 1, 2, . . . do
if ∃i ∈ {1, . . . , m} s.t. yi ̸= sign
w
⊤
t xi

or |w
⊤
t xi
| ≤ 1 then update wt+1 = wt + yixi
else output wt
end
We will show that Margin Perceptron stops after 3/γ2
steps and returns a hyperplane w such that
min
i∈{1,...,m}

∥w∥2
≥ γ/3.
Note that the margin is the distance of the closest point to the hyperplane, and since ∥w∥2 is not
necessarily norm 1, this quantity is given by mini∈{1,...,m}
|w⊤xi|
∥w∥2
.
As in the lecture, we will assume that ∥xi∥2 ≤ 1 for all i ∈ {1, . . . , m} and ∥w∗∥
2
2 = 1.

1.1 (2 points) Show that after every round t, we have
w
⊤
∗ wt+1 ≥ w
⊤
∗ wt + γ.
1.2 (4 points) Show that after every round t, we have
∥wt+1∥
2
2 ≤ ∥wt∥
2
2 + 3.
1.3 (3 points) Using the above two parts, show that after T rounds,
γT ≤ ∥wT +1∥2 ≤
√
3T .
Hint: Use the Cauchy-Schwarz Inequality: a
⊤b ≤ ∥a∥∥b∥.
1.4 (1 point) Use 1.3, to conclude that T ≤ 3/γ2
.
1.5 (4 points) Show that the output hyperplane w satisfies
min
i
|w
⊤xi
|
∥w∥21
≥
γ
3
.
Hint: You will need to use the results in 1.2 and 1.3 plus the stopping condition of the algorithm.
1.6 (1 point) Why is it desirable to learn a predictor that has margin?
Problem 2: Bayes Optimal Classifier (15 points)
Let η(x) denote the conditional probability of the label being 1 given a point x under the distribution
D. That is
η(x) = Pr[y = 1|x].
Recall that the true risk, under the 0/1 loss, for any classifier f is
R(f) = E
x,y
[1[f(x) ̸= y]] .
The Bayes optimal classfier w.r.t. D is the classifier f∗ that achieves the minimum risk among all
possible classifiers. In this problem, we will work out what the Bayes optimal classifier is.
2.1 (3 points) Show that
R(f) = Ex [η(x)1[f(x) = −1] + (1 − η(x))1[f(x) = 1]] .
Hint: Use the fact that Ex,y[·] = Ex Ey|x[·].
2
2.2 (3 points) Use the above to show that the minimum risk possible is
R(h∗) = min
f
R(f) = Ex [min(η(x), 1 − η(x))]
Hint: For a fixed x, think about what the minimum loss is using 2.1.
2.3 (2 points) Show that the Bayes optimal classifier that achieves the above loss is
f∗(x) = (
1 if η(x) ≥ 1/2,
−1 otherwise..
2.4 (1 point) Derive the Bayes optimal classifier under the logistic model
η(x) = 1
1 + exp(−w⊤x)
.
2.5 (6 points) Now suppose we modify the loss function from 0/1 to the following cost-based loss
function
ℓc(ˆy, y) =



c if y = 1, yˆ = −1
1 − c if y = −1, yˆ = 1
0 if y = ˆy.
Here the loss penalizes false negative with cost c and false positive with cost 1 − c, penalizing
different types of mistakes differently.1
Note that the true risk under this loss is
Rc(f) = E
x,y
[ℓc(f(x), y)] .
Find the Bayes optimal classifier in this setting.
Hint: Follow the same ideas you used to solve 2.1-2.3 using ℓc instead of 0/1 loss.
2 Programming Questions (16 points)
Use the link here to access the Google Colaboratory (Colab) file for this homework. Be sure to
make a copy by going to ”File”, and ”Save a copy in Drive”. This assignment uses the PennGrader
system for students to receive immediate feedback. As noted on the notebook, please be sure to
change the student ID from the default ’99999999’ to your 8-digit PennID.
Instructions for how to submit the programming component of HW 1 to Gradescope are included
in the Colab notebook. You may find this PyTorch reference to be helpful - if you get stuck, it may
be helpful to review some of the PyTorch documentation and functions.
1Let us see why this is a useful loss function. Consider the case of medical diagnosis, high false negative rate
means that we are predicting that patients do not have the disease when they actually do. Such a prediction could
lead to the patient not getting the care they need. In such a setting, you would want c to be closer to 1.