代做STATS 763 - 2022 - Final examination代做Prolog
- 首页 >> OS编程STATS 763 - 2022 - Final examination
Available from 15 June 2022 at 17:00 NZST (5:00PM)
Due by 15 June 2022 at 19:30 NZST (7:30PM)
General instructions
• This examination consists of these instructions and 3 questions on 8 pages. Attempt all questions. The exam will be marked out of 100, out of a possibility of 100.
• Inspera requires you to upload a single file containing your answers to all three questions. The file size cannot exceed 1 Gb.
• The duration of the examination is two hours, between 17:00 (5:00PM) and 19:00 (7:00PM) on 15 June 2022, New Zealand Standard Time.
• This assessment was designed to be completed within 2 hours by a pre- pared student. However, you have 2 hours and 30 minutes in which to complete and submit it.
• Submissions will be open until 19:30 NZST (7:30PM) to allow for scan- ning and uploading. It is your responsibility to ensure your assessment is successfully submitted on time.
• All reasonable forms of answer file format will be accepted, including clearly scanned or photographed hand-written responses, PDF documents, Word or similar Libre Office documents, markdown files, etc.
• This is an on-line open-book exam. You are allowed any resource to answer the questions except consulting another person (see Academic Honesty Declaration). Piazza will be unavailable during the examination except to address private queries to the instructors.
• Computing final numerical answers is not required. It is su代cient, for full marks, to produce a correct computable solution.
Question 1: [Total: 30 marks]
Data were collected from 696 randomly sampled women who gave birth over a 4 1/2 month period in a New Zealand hospital in 2011. The following table describes the data.
The outcome of interest is the diference between the date of birth and the expected date of delivery (‘DOB - EDD ‘), measured in weeks. A moderate positive diference (late birth) is generally not an issue; a negative diference (early birth) larger in magnitude than 3 weeks identifies the baby as premature.
The exposure of interest is the Number of previous pregnancies without live birth (‘n stillbirths‘) defined as Gravidity-Parity-1. (The “-1” dis- counts the current pregnancy). This number includes stillbirths and voluntary interruptions of pregnancy.
Adjustments for ethnicity (binary variables eth EurOther,eth Maori,eth Pasifika and eth Asian), age group (AgeGrp), and presence of a husband or partner
(HusbPart) are considered sufficient to account for confounding in this obser- vational study.
a) [6 marks] We fit two linear least-squares models, A and B.
Using the partial output supplied below, test the null hypothesis H0 : βGravidity = -βParity vs H1 : not H0 .
Justify your answer briely.
Model A:
Coefficients:
Estimate Std . Error t value Pr(>| t | )
(Intercept) 0 .05010 0 .38813 0 .129 0 .89734
Gravidity -0 .30532 0 .09951 -3 .068 0 .00224 **
Parity 0 .30649 0 .12137 2 .525 0 .01178 *
[snip - you don’t need the missing output to answer the question]
(Dispersion parameter for gaussian family taken to be 5 .159174)
Null deviance: 3639 .1 on 695 degrees of freedom
Residual deviance: 3523.7161 on 683 degrees of freedom
Model B:
Coefficients:
Estimate Std . Error t value Pr(>| t | )
(Intercept) -0 .25347 0 .35700 -0 .710 0 .47796
‘n stillbirths‘ -0 .30509 0 .09877 -3 .089 0 .00209 **
[snip]
(Dispersion parameter for gaussian family taken to be 5 .151635)
Null deviance: 3639 .1 on 695 degrees of freedom
Residual deviance: 3523.7181 on 684 degrees of freedom
b) [6 marks] According to the fitted model below, how large would the num- ber of previous pregnancies with no live birth (the ‘n stillbirths‘ co- variate) need to be for ‘DOB - EDD‘ to be more negative than -3 weeks on average, if the expectant mother is Asian with no other ethnicity, has no husband/partner and is over 40? It is su代cient to set up the equation without solving it.
Model B (again)
Coefficients:
Estimate Std . Error t value Pr(>| t | )
(Intercept) -0 .25347 0 .35700 -0 .710 0 .47796
‘n stillbirths‘ -0 .30509 0 .09877 -3 .089 0 .00209 **
eth_Maori -0 .11255 0 .28660 -0 .393 0 .69465
eth_Pasifika -0 .32473 0 .31575 -1 .028 0 .30411
eth_Asian -0 .60485 0 .36708 -1 .648 0 .09987 .
eth_EurOther 0 .15025 0 .30503 0 .493 0 .62248
AgeGrp<20 -0 .17129 0 .32625 -0 .525 0 .59974
AgeGrp20 - 24 0 .07679 0 .25737 0 .298 0 .76552
AgeGrp30 - 34 -0 .05538 0 .25319 -0 .219 0 .82692
AgeGrp35 - 39 -0 .08779 0 .28121 -0 .312 0 .75499
AgeGrp40+ -0 .98586 0 .61525 -1 .602 0 .10953
HusbPartNo -0 .24467 0 .22864 -1 .070 0 .28495
(Dispersion parameter for gaussian family taken to be 5 .151635)
Null deviance: 3639 .1 on 695 degrees of freedom
Residual deviance: 3523.7 on 684 degrees of freedom
c) [6 marks] Explain in words and with simple notation how you could produce a Wald confidence interval for your answer in b) using a reliable method to estimate standard errors.
d) [6 marks] We consider a model including the interaction terms between ‘n stillbirths‘ and all ethnicity variables. A partial summary is shown below:
Model C
Coefficients:
Estimate Std . Error t value Pr(>| t | )
(Intercept) -0 .35170 0 .40251 -0 .874 0 .383
‘n stillbirths‘ -0 .03248 0 .38871 -0 .084 0 .933
eth_Maori -0 .20279 0 .33713 -0 .602 0 .548
eth_Pasifika -0 .09238 0 .36578 -0 .253 0 .801
eth_Asian -0 .61988 0 .42384 -1 .463 0 .144
eth_EurOther 0 .31564 0 .36039 0 .876 0 .381
[snip]
‘n stillbirths‘:eth_Maori 0 .14032 0 .34115 0 .411 0 .681
‘n stillbirths‘:eth_Pasifika -0 .46421 0 .39582 -1 .173 0 .241
‘n stillbirths‘:eth_Asian 0 .09691 0 .48039 0 .202 0 .840
‘n stillbirths‘:eth_EurOther -0 .34506 0 .33578 -1 .028 0 .304 (Dispersion parameter for gaussian family taken to be 5 .12995)
Null deviance: 3639 .1 on 695 degrees of freedom
Residual deviance: 3488 .4 on 680 degrees of freedom
Test the significance of the interaction term by producing an appropriate test statistic and p-value; make sure to specify the approximate distribu- tion of the test statistic under the null hypothesis of no interaction.
e) [6 marks] The distribution of premature births by maternal age group is shown below:
Counts of premature births by age group, original data
Premature <20 20 - 24 25 - 29 30 - 34 35 - 39 40+
FALSE 72 148 146 148 103 12
TRUE 8 14 16 16 10 3
We create a subsampled data set consisting of all premature births and a sample of twice that number of non-premature births, stratified by age. The distribution of prematurity and age group in the subsampled data is given below.
Counts of premature births by age group, subsampled data
Premature <20 20 - 24 25 - 29 30 - 34 35 - 39 40+
FALSE 16 28 32 32 20 6
TRUE 8 14 16 16 10 3
Explain how to fit a relative risk model of prematurity to the subsampled data that will estimate the efect of ‘n stillbirths‘ on Premature unbiasedly (still assuming that confounding is correctly addressed) and will produce reliable standard errors.
Question 2: [Total: 30 marks]
A model for Covid risk in arrivals at the border is given by
logitP [Yit = 1jbi] = αi + bi + β1Xi;t + β2Xi;t-1
where Yit is the probability of testing positive within a week of arrival for an individual from country i during week t of the epidemic, Xi;t is the incidence of diagnosed Covid cases per 100,000 people in country i during week t, and Xi;t-1 is the incidence of diagnosed Covid cases per 100,000 people in country i during week t - 1. The model for the random efects bi is
bi ~ N(0, τ2 ).
a) [5 marks] This model could be fitted with separate fixed efects √i in- stead of the random efects b i . Explain the term ”shrinkage” and how it connects the values of bi i .
b) [10 marks] Other predictor variables (incidence rates, test positivity rates, death rates, testing rates) are available. Describe away to choose a model that predicts accurately, using weekly data from each country.
c) [5 marks] The values ofβ(^)1 and β(^)2 are approximately 1 and -0.5 respec-
tively. A non-statistician asks if the negative value of β(^)2 means that higher incidence leads to lower risk. How would you answer?
d) [10 marks] If the predictors were transformed to Xi;t and Xi;t - Xi;t-1 , what would be the coe代cients of these two variables? What would the interpretation of these coe代cients be?
Question 3: [Total: 40 marks]
Data collected in STATS 201 show that students who regularly attend lec- tures obtain higher average grades than those who do not regularly attend lectures. One possible explanation is that lectures are useful; another is that lecture attendance is not actually useful, but is an efect of student interest in statistics, and that interest in statistics afects grades.
Using variables ATTEND for regular attendance, GRADE for grades, and INTEREST for interest in statistics, answer the following questions.
a) [5 marks] Draw causal graphs that represent the two competing explana- tions for the correlation between attendance and grades.
b) [5 marks] Write down a regression model where the coe代cient of ATTEND estimates the efect of lecture attendance on grades.
c) [10 marks] Any realistic measurement of the variable INTEREST will not be a perfect representation of the true underlying variable. Modify your causal graphs to show both the true underlying variable INTEREST and the measurement INTEREST*. Explain how this will afect the interpretation of your model in part b).
d) [10 marks] Suppose a lecturer increases attendance by ofering an unre- lated incentive, such as handing out chocolate. Under the two competing explanations, extend your causal graphs from part A to include a vari- able INCENTIVE, say whether the incentive would be expected to increase average grades, and explain why.
e) [10 marks] Give conditions under which the incentive is an instrumental variable and describe how it allows you to estimate the efect of ATTEND on GRADE.