EE425X程序设计讲解、program编程讲解、辅导Java,Python程序 讲解R语言编程|辅导留学生 Statistics统计、回归、迭代
- 首页 >> C/C++编程 Homework 1b: Linear Regression part 2.
EE425X - Machine Learning: A Signal Processing Perspective
Homework 1 focused on learning the parameter θ for linear regression. In this homework we will first understand
how to use the learnt parameter to predict the output for a given query input. We will also understand
bias-variance tradeoff and how to decide the model dimension when limited training data is available. This
HW will rely heavily on the code from the previous homework.
Generate Data Code: Generate m + mtest data points satisfying
y = θ
Tx + e
with θ being ONE fixed n length vector for all of them. Use n = 100, θ = [100, −99, 98, −97...1]0, σ2e = 0.01||θ||22,e ∼ N (0, σ2e), x ∼ N (0, I), and assume mutual independence of the different inputs and noise values (e).
1. Use code from Homework 1 (using any one approach is okay) to learn θ. Vary m and show a plot of both
estimation error in θ,||θ − ˆθ||22/||θ||2
and a second plot of the “Monte Carlo estimate” of the prediction error on the test data (test data MSE).
Normalized-Test-MSE := E[(ytest − yˆ)2]/E[y2test], with ˆy := ˆθ
Txtest
Monte Carlo estimate means: compute (ytest −yˆ)
2
for mtest different input-output pairs and then average
the result.
(a) Vary m: use m = 80, m = 100, m = 120, m = 400. If your code is unable to return an estimate of θ,
you can report the errors to be ∞ (and for the plot just use a large value say 100000 to replace ∞.
(b) Repeat this experiment with σ2e = 0.1||θ||22.
Thus this part will produce four plots.
2. In this second part, suppose you have only m = 80 training data points satisfying y = θTx + e, with
n = 100. Notice n is the same as in the first part. I had a typo earlier which has now been fixed.
What you will have concluded from part 1 is that you cannot learn θ correctly in this case because m is
even smaller than n.
Let us assume you do not have the option to increase m. What can you do? All you can do is reduce n
to a value nsmall ≤ m. Experiment with different values of nsmall to come up with the best one. Do this
experiment for two values of σ2e: σ2e = 0.01||θ||22and σ2e = 0.1||θ||22.
How to decide which entries of x to throw away? For now, just throw away the last n−nsmall + 1 entries.
So for nsmall = 1, let xsmall be just the first entry, and so on. So for nsmall = 30 for example, xsmall
will be the first 30 entries of x. There are many other better ways which we will learn about later in the
course.
Start with nsmall = 1 and keep increasing its value and each time compute Normalized-Test-MSE by
learning a value of θ first (using m = 80 of course). Obtain a plot. Use the plot and what you learn in
class to decide what value of nsmall is best.
3. Interpret your results based on the Bias-Variance tradeoff discussion. See Section 11 of Summary-Notes
and what will be taught in the next few classes.
1
EE425X - Machine Learning: A Signal Processing Perspective
Homework 1 focused on learning the parameter θ for linear regression. In this homework we will first understand
how to use the learnt parameter to predict the output for a given query input. We will also understand
bias-variance tradeoff and how to decide the model dimension when limited training data is available. This
HW will rely heavily on the code from the previous homework.
Generate Data Code: Generate m + mtest data points satisfying
y = θ
Tx + e
with θ being ONE fixed n length vector for all of them. Use n = 100, θ = [100, −99, 98, −97...1]0, σ2e = 0.01||θ||22,e ∼ N (0, σ2e), x ∼ N (0, I), and assume mutual independence of the different inputs and noise values (e).
1. Use code from Homework 1 (using any one approach is okay) to learn θ. Vary m and show a plot of both
estimation error in θ,||θ − ˆθ||22/||θ||2
and a second plot of the “Monte Carlo estimate” of the prediction error on the test data (test data MSE).
Normalized-Test-MSE := E[(ytest − yˆ)2]/E[y2test], with ˆy := ˆθ
Txtest
Monte Carlo estimate means: compute (ytest −yˆ)
2
for mtest different input-output pairs and then average
the result.
(a) Vary m: use m = 80, m = 100, m = 120, m = 400. If your code is unable to return an estimate of θ,
you can report the errors to be ∞ (and for the plot just use a large value say 100000 to replace ∞.
(b) Repeat this experiment with σ2e = 0.1||θ||22.
Thus this part will produce four plots.
2. In this second part, suppose you have only m = 80 training data points satisfying y = θTx + e, with
n = 100. Notice n is the same as in the first part. I had a typo earlier which has now been fixed.
What you will have concluded from part 1 is that you cannot learn θ correctly in this case because m is
even smaller than n.
Let us assume you do not have the option to increase m. What can you do? All you can do is reduce n
to a value nsmall ≤ m. Experiment with different values of nsmall to come up with the best one. Do this
experiment for two values of σ2e: σ2e = 0.01||θ||22and σ2e = 0.1||θ||22.
How to decide which entries of x to throw away? For now, just throw away the last n−nsmall + 1 entries.
So for nsmall = 1, let xsmall be just the first entry, and so on. So for nsmall = 30 for example, xsmall
will be the first 30 entries of x. There are many other better ways which we will learn about later in the
course.
Start with nsmall = 1 and keep increasing its value and each time compute Normalized-Test-MSE by
learning a value of θ first (using m = 80 of course). Obtain a plot. Use the plot and what you learn in
class to decide what value of nsmall is best.
3. Interpret your results based on the Bias-Variance tradeoff discussion. See Section 11 of Summary-Notes
and what will be taught in the next few classes.
1