辅导CSE 404、讲解Data留学生、讲解c/c++编程设计、辅导Java,Python
- 首页 >> Database CSE 404: Introduction to Machine Learning (Spring 2020)
Homework #6
Due 3/12/2020
Note: LFD refers to the textbook “Learning from Data”.
1. (5 points) Exercise 3.6 (page 92) in LFD.
2. (5 points) Exercise 3.7 (page 92) in LFD.
3. (15 points) Handwritten Digits Data: You should download the two data files with handwritten
digits data: training data (ZipDigits.train) and test data (ZipDigits.test). Each row is a
data example. The first entry is the digit, and the next 256 are grayscale values between -1
and 1. The 256 pixels correspond to a 16 × 16 image. For this problem, we will only use the
1 and 5 digits, so remove the other digits from your training and test examples.
(a) (5 points) Familiarize yourself with the data by giving a plot of two of the digit images.
(b) (5 points) Develop two features to measure properties of the image that would be useful
in distinguishing between 1 and 5. You may use symmetry and average intensity (as
discussed in class).
(c) (5 points) As in the text, give a 2-D scatter plot of your features: for each data example,
plot the two features with a red × if it is a 5 and a blue ◦ if it is a 1.
4. (40 points) Classifying Handwritten Digits: 1 vs. 5. Implement logistic regression for classifi-
cation using gradient descent to find the best separator that you can using the training data
only. Use your 2 features from the above question (3b) as the input. The output is +1 if the
example is a 1 and -1 if the example is a 5.
(a) (10 points) Give separate plots of the training and test data, together with the separators.
(b) (10 points) Compute Ein on your training data and Etest, the test error on the test data.
(c) (10 points) Now repeat (b) using a 3rd order polynomial transform.
(d) (10 points) Would you use the linear model with or without the 3rd order polynomial
transform? Explain.
1
Homework #6
Due 3/12/2020
Note: LFD refers to the textbook “Learning from Data”.
1. (5 points) Exercise 3.6 (page 92) in LFD.
2. (5 points) Exercise 3.7 (page 92) in LFD.
3. (15 points) Handwritten Digits Data: You should download the two data files with handwritten
digits data: training data (ZipDigits.train) and test data (ZipDigits.test). Each row is a
data example. The first entry is the digit, and the next 256 are grayscale values between -1
and 1. The 256 pixels correspond to a 16 × 16 image. For this problem, we will only use the
1 and 5 digits, so remove the other digits from your training and test examples.
(a) (5 points) Familiarize yourself with the data by giving a plot of two of the digit images.
(b) (5 points) Develop two features to measure properties of the image that would be useful
in distinguishing between 1 and 5. You may use symmetry and average intensity (as
discussed in class).
(c) (5 points) As in the text, give a 2-D scatter plot of your features: for each data example,
plot the two features with a red × if it is a 5 and a blue ◦ if it is a 1.
4. (40 points) Classifying Handwritten Digits: 1 vs. 5. Implement logistic regression for classifi-
cation using gradient descent to find the best separator that you can using the training data
only. Use your 2 features from the above question (3b) as the input. The output is +1 if the
example is a 1 and -1 if the example is a 5.
(a) (10 points) Give separate plots of the training and test data, together with the separators.
(b) (10 points) Compute Ein on your training data and Etest, the test error on the test data.
(c) (10 points) Now repeat (b) using a 3rd order polynomial transform.
(d) (10 points) Would you use the linear model with or without the 3rd order polynomial
transform? Explain.
1