MATH 185辅导、讲解bootstrap留学生、辅导Java,Python、辅导c/c++编程
- 首页 >> 其他 MATH 185 – Take-Home Exam 2
Due Sunday, June 9th, by 11:59 PM
AGREEMENT
By taking this exam, you agree to not discuss the exam with anyone, starting now,
neither with a classmate or anyone else, neither in person nor through other means,
including electronic. Please do not post questions on Piazza. Unless otherwise speci-
fied, it is acceptable to copy-paste from the lecture or homework solution code.Problem 1. (Bootstrap tests for goodness-of-fit) We saw in lecture that when it comes to
goodness-of-fit (GOF) testing, it is quite “natural” to obtain a p-value by permutation. It is also
possible, however, to use the bootstrap for that purpose. Consider the two-sample situation for
simplicity, although this generalizes to any number of samples. Thus assume a situation where we
observe X1, . . . , Xm iid from F and (independently) Y1, . . . , Yn iid from G, where F and G are two
distributions on the real line. We want to test F = G versus F 6= G. We may want to use a statistic
T = T(X1, . . . , Xm, Y1, . . . , Yn) for that purpose, and the question is how to obtain a p-value for T
via a bootstrap. The idea is, as usual, to estimate the “best” null distribution and bootstrap from
that distribution. A natural approach to estimate the null distribution is to simply combine the
two samples as one, and estimate the corresponding distribution via the empirical distribution. We
thus use the empirical distribution from the combined sample to bootstrap from.
A. Write a function bootGOFdiff(x, y, B = 2000) that takes in two samples as vectors x and y,
and a number of replicates B (Monte Carlo samples from the estimated null distribution),
and returns the bootstrap GOF p-value for the difference in means T = |Xˉ Yˉ |.
B. Apply your function to the FIFA dataset to compare the wages of players ≤ 29 years old with
older players (≥ 30 years old).Problem 2. (Local Absolute Linear Regression) Local linear regression is a popular
smoother. However, based on the squared errors, it is not robust. To make it more robust, one
option is to use the absolute errors instead.
A. Write a function localAbsLinearRegression(x, y, h, xnew = x) that takes in paired vectors x
(predictor) and y (response), and a bandwidth h, and computes the local absolute linear
regression (use any kernel of your liking). The function is evaluated at the vector xnew (equal
to x by default).
B. Apply your function to the Boeing stock closing prices from 1/01/2018 to 6/01/2019 — see
the BA.csv file, which was downloaded from here (some dates are missing for some unknown
reason). Plot the data and overlay the fitted curve for a few choices of bandwidth (identified
in a legend).
C. Choose the bandwidth by 10-fold cross-validation.
Due Sunday, June 9th, by 11:59 PM
AGREEMENT
By taking this exam, you agree to not discuss the exam with anyone, starting now,
neither with a classmate or anyone else, neither in person nor through other means,
including electronic. Please do not post questions on Piazza. Unless otherwise speci-
fied, it is acceptable to copy-paste from the lecture or homework solution code.Problem 1. (Bootstrap tests for goodness-of-fit) We saw in lecture that when it comes to
goodness-of-fit (GOF) testing, it is quite “natural” to obtain a p-value by permutation. It is also
possible, however, to use the bootstrap for that purpose. Consider the two-sample situation for
simplicity, although this generalizes to any number of samples. Thus assume a situation where we
observe X1, . . . , Xm iid from F and (independently) Y1, . . . , Yn iid from G, where F and G are two
distributions on the real line. We want to test F = G versus F 6= G. We may want to use a statistic
T = T(X1, . . . , Xm, Y1, . . . , Yn) for that purpose, and the question is how to obtain a p-value for T
via a bootstrap. The idea is, as usual, to estimate the “best” null distribution and bootstrap from
that distribution. A natural approach to estimate the null distribution is to simply combine the
two samples as one, and estimate the corresponding distribution via the empirical distribution. We
thus use the empirical distribution from the combined sample to bootstrap from.
A. Write a function bootGOFdiff(x, y, B = 2000) that takes in two samples as vectors x and y,
and a number of replicates B (Monte Carlo samples from the estimated null distribution),
and returns the bootstrap GOF p-value for the difference in means T = |Xˉ Yˉ |.
B. Apply your function to the FIFA dataset to compare the wages of players ≤ 29 years old with
older players (≥ 30 years old).Problem 2. (Local Absolute Linear Regression) Local linear regression is a popular
smoother. However, based on the squared errors, it is not robust. To make it more robust, one
option is to use the absolute errors instead.
A. Write a function localAbsLinearRegression(x, y, h, xnew = x) that takes in paired vectors x
(predictor) and y (response), and a bandwidth h, and computes the local absolute linear
regression (use any kernel of your liking). The function is evaluated at the vector xnew (equal
to x by default).
B. Apply your function to the Boeing stock closing prices from 1/01/2018 to 6/01/2019 — see
the BA.csv file, which was downloaded from here (some dates are missing for some unknown
reason). Plot the data and overlay the fitted curve for a few choices of bandwidth (identified
in a legend).
C. Choose the bandwidth by 10-fold cross-validation.