代写STATS 763: Advanced Regression Methodology Midterm Test代写留学生Matlab程序
- 首页 >> C/C++编程Department of Statistics
STATS 763: Advanced Regression Methodology
Midterm Test
Thursday 14 September, 16:30-18:00
Notes:
• This Midterm Test consists of 3 questions on 3 pages and is marked out of 50.
• Its result will count for 20% of your grade (or 0% if you do better on the Final Exam).
• This is a restricted open book test. You are allowed to consult an A4 sheet of paper covered on both sides with any information.
• You are not allowed any calculator, phone, digital watch or earpiece.
• Identify the supplied booklet with your name and student number and write your answers in it, clearly labelled.
Quantiles that may be useful: χ1(2) ;0.95 = 3.84, χ2(2) ;0.95 = 5.99, χ3(2) ;0.95 = 7.81, χ4(2) ;0.95 = 9.49, z0.975 = 1.96 (standard normal quantile).
1. (20 marks total)
Data frame. ed contains data regarding the length of stay (in hours) of 2050 adult (≥ 15 years old) patients from 127 Québec hospitals in Emergency Departments (in New Zealand: Accidents & Emergencies, or A&E), collected over a period in 2002. We have data regarding the ID of the hospital, as well as the age and some comorbid (concurrent disease) conditions in the patients: respiratory condition, cardiac condition and mental condition, each encoded as Yes or No. We are interested in the effect of age on length of stay, and whether this effect depends on the comorbid conditions.
We fit a GLM with a Gamma family and a log link to the length of stay, as shown below, adjusting for the Hospital ID and putting age in interaction with each comorbid conditions. Selected output is shown below.
> mod1 <- glm(`Length of stay` ~age*(Respiratory+Cardiac+Mental)+`Hospital ID` + ,family=Gamma(link=log)
+ ,data=ed)
# |
A tibble: 8 × 5 |
|
|
|
|
|
term |
estimate |
std .error |
statistic |
p .value |
|
<chr> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
1 |
(Intercept) |
0 .0401 |
0 .533 |
0 .0752 |
9 .4 e- 1 |
2 |
age |
0 .012 |
0 .00182 |
6 .59 |
5 .85e-11 |
3 |
RespiratoryYes |
0 .417 |
0 .21 |
1 .99 |
4 .69e- 2 |
4 |
CardiacYes |
-0 .326 |
0 .165 |
-1 .97 |
4.84e- 2 |
5 |
MentalYes |
0 .748 |
0 .26 |
2 .88 |
4 .07e- 3 |
6 |
age:RespiratoryYes |
-0 .00303 |
0 .00354 |
-0 .854 |
3 .93e- 1 |
7 |
age:CardiacYes |
0 .00411 |
0 .00295 |
1 .39 |
1 .64e- 1 |
8 |
age:MentalYes |
-0 .0102 |
0 .00528 |
-1 .93 |
5 .44e- 2 |
a) For patients with respiratory and cardiac comorbid conditions but no mental comorbid condition, how is the average length of stay affected for each increase of 10 years of age, according to mod1? Write an expression as an answer but do not evaluate it. [5 marks]
b) What is the estimated expected length of stay in the reference hospital for a 60-year- old patient with cardiac and mental conditions but no respiratory condition, according to mod1? Write an expression for this quantity but do not evaluate it. [5 marks]
c) For a patient without a comorbid condition, find an approximate 95% confidence interval for the effect on the length of stay of an increase of 5 years in age. Write expressions for the bounds of this interval but do not evaluate them. [5 marks]
d) A model similar to mod1 is fitted, but without the interaction terms.
> mod0 <- glm(`Length of stay` ~age+Respiratory+Cardiac+Mental+`Hospital ID`
+ ,family=Gamma(link=log)
+ ,data=ed)
The deviance and estimated dispersion parameter of mod1 are D1 =1458.55 and φ(^) =1.13, respectively; the deviance of mod0 is D0 =1466.16. Test at the 5% level whether the age and comorbidity interaction parameters are all 0.
Be clear about the numerical value of the test statistic, its distribution under the null, and whatever value to which you are comparing it. [5 marks]
2. (12 marks total) We fit a GLM by solving the score equation ε xTiw i(Yi -μi) = 0, for xi a 1×p covariate vector, μi = g-1(xiβ) and 0 the 1 ×p vector of all zeros, i = 1, . . . , n.
Provide an expression for wi as a function of μi in the following situations:
a) A Normal model for Yi with a logarithmic link g(μi) = log μi , μi > 0. [4 marks]
b) A quasi-binomial model for Yi with an identity link g(μi) = μi , μi > 0. [4 marks]
c) A quasi-likelihood model with variance function V (μi) = μi(2)(1 - μi)2 and a logit link
g(μi) = log [μi/(1 - μi)], μi ∈ (0, 1). [4 marks]
3. (18 marks total) Answer the following questions:
a) Let fac be a factor with two levels and y be some outcome of interest.
We fit model1 <- lm(yf(˜)ac) and model2 <- lm(yf(˜)ac-1) (so model2 does not have
an intercept). Write two equations relating the coefficients of model1 with those of model2. [4 marks]
b) True or False: Odds are always closer to 1 than their corresponding probability. [2 marks]
c) How is the dispersion parameter usually estimated in a generalised linear model? [2 marks]
d) The “cheese” in the sandwich variance estimator is the covariance of what object? [2 marks]
e) In R, what argument can we pass to function glm() if we suspect that the variance of the outcome is proportional, but not necessarily equal, to its mean? [2 marks]
f) How can we estimate the variance of linear coefficients β(ˆ) if we set g (IE [Y]) = Xβ , but we are not confident about the variance function? [3 marks]
g) A population comprises r people with a certain disease (cases, Yi = 1, i = 1, . . . , r) and 9r people without the disease (controls, Yi = 0, i = r + 1, . . . , 10r). Data are collected on all cases, and on a random sample of r controls. Binary covariate xi , i = 1, . . . , 10r is collected, and we assume no confounding is present.
Describe a method to estimate without bias the log-relative risk of Yi = 1 when xi = 1 vs xi = 0, with a reliable standard error. You can either write the necessary R command or commands, or describe the method in sufficient detail that the corre- sponding R command or commands could be fully specified. [3 marks]