STA 463辅导、R程序语言调试、辅导data留学生、讲解R

2020.04.26 - 首页 >> Python编程

STA 463 Exam #2 Spring 2020
take home part 20 points
Instructions:
• These are all full-work problems, and point values are noted next to each part. In order to
receive credit for a problem, your solution must show sufficient details so that the grader can
determine how you obtained your answer. No work = no credit.
• Carry all computations to at least two decimal places. Only round the final answer. Do not
round during intermediate steps.
• The exam is open book/notes. But you should work independently on this exam. You cannot
discuss the exam questions with anyone, and cannot share the exam questions in any format
with anyone. Violations could result in a report of academic dishonesty issue and all the
consequences would follow.
• for graduate students, I will multiply your points in Q3 by 0.6 so that the total remains 20
points.
Q1. (4 points) A study was conducted to determine if there was an association between
the size (weight, in grams) of twenty-seven mice and four predictors: their sex (male or female,
“x1M”; x1M = 0 if the mouse is female, x1M = 1 if male) and three measures related to the size of
their features (occipital-incisor length, “x2”; orbital width, “x3”; skull height, “x4”; all measured
in millimeters). Below is some regression output for a model using the response variable and the
four predictors. Residual plots are also included.
----------------------------------------------------------------------------------------------------------------------
Call:
lm(formula = y ~ x1 + x2 + x3 + x4)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -45.2811 24.0279 -1.885 0.07278 .
x1M 0.9048 0.7112 1.272 0.21659
x2 2.7080 0.8033 3.371 0.00275 **
x3 -0.5901 1.7788 -0.332 0.74323
x4 0.7591 1.6671 0.455 0.65332
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.735 on 22 degrees of freedom
Multiple R-squared: 0.4121, Adjusted R-squared: 0.3053
F-statistic: 3.856 on 4 and 22 DF, p-value: 0.01603
------------------------------------------------------------------------------------------------------------------------
(a). (2 points.) Please explain the regression coefficient associated with x1M, 0.9048, in
the context of this problem.
(b). (2 points.) The R2 and adjusted R2 are not very close to each other. Why?
Q2. (6 points) Consider a dataset with a response and four predictors. The design matrix
for the first-order additive model is calculated. (X0X)
−1
, part of the hat matrix, and the vector of
estimated parameters are given below.
----------------------------------------------------------------------------------------------------------------------
> solve(t(X)%*%X)
x1 x2 x3 x4
1.259327e-01 -4.058861e-05 -7.821644e-04 -9.386430e-03 -1.048498e-02
x1 -4.058861e-05 3.749686e-08 -4.554027e-08 -5.376013e-06 -2.788214e-06
x2 -7.821644e-04 -4.554027e-08 6.111386e-05 7.457683e-05 -2.964554e-04
x3 -9.386430e-03 -5.376013e-06 7.457683e-05 5.374121e-03 -2.488392e-03
x4 -1.048498e-02 -2.788214e-06 -2.964554e-04 -2.488392e-03 4.158201e-02
H=X%*%solve(t(X)%*%X)%*%t(X)
2
> H[1:6,1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.07032303 0.03062687 0.04481705 0.06149829 0.03062320 0.03062320
[2,] 0.03062687 0.05189401 0.04550879 0.05210952 0.04872989 0.04872989
[3,] 0.04481705 0.04550879 0.04620762 0.05625619 0.04354044 0.04354044
[4,] 0.06149829 0.05210952 0.05625619 0.07156503 0.04969182 0.04969182
[5,] 0.03062320 0.04872989 0.04354044 0.04969182 0.04613626 0.04613626
[6,] 0.03062320 0.04872989 0.04354044 0.04969182 0.04613626 0.04613626
> B=solve(t(X)%*%X)%*%t(X)%*%y
> B
[,1]
30.8202702
x1 0.5846615
x2 -3.2686964
x3 22.3888892
x4 42.4382015
------------------------------------------------------------------------------------------------------------------------
(a). (2 points.) If MSE = 39, 880, what is the estimated variance of B4 (the parameter
estimate associated with x4)?
(b). (2 points.) If the appropriate t-multiplier is 1.98, what is the 95% confidence interval
for β4?
(c). (2 points.) What is the estimate of the covariance between the first and second
residual?
3
Q3. (10 points for undergrad; 6 points for grad) Consider a dataset consisting of a
random sample of livestock sales at Wapello Livestock Sales in Wapello, Iowa over several months
in 1999-2000. Suppose the response variable Y is the selling price of the cow in dollars, and that
the predictor variables were age of the animal in years (X1), weight of the animal in 100’s of pounds
(X2), and whether or not it is an Angus cow (X3 is 1 if Angus, 0 otherwise).
(a). (4 points.) Fit a multiple regression model with all three predictors (first order terms
only, no higher order terms, no interactions). Please interpret the estimated regression coefficient
associated with the weight predictor and find a 95% confidence interval for this parameter.
(b). (3 points.) Consider the model you fitted in (a). Please test H0 : β1 = β2 = β3 = 0
vs. HA : not all three β’s are equal to 0 using the ANOVA F test.
(c). (3 points.) Please fit a second multiple regression model with the three predictors, as
well as interaction terms between X3 and the other two predictors, i.e., X1 ∗ X3, X2 ∗ X3. what is
the fitted model for non-Angus cows?
Q4. (4 points, grad only) Suppose that the normal error simple linear regression model
is applicable, except that the error variance is not constant. In particular, the larger the fitted
value the larger the error variance. In this case, does β1 > 0 still imply that there is a positive,
linear relationship between X and Y ? Explain, briefly.
4