代做STATS 763 - 2022 - Final examination代做Prolog

- 首页 >> OS编程

STATS 763 - 2022 - Final examination

Available from 15 June 2022 at 17:00 NZST (5:00PM)

Due by 15 June 2022 at 19:30 NZST (7:30PM)

General instructions

•  This examination consists of these instructions and 3 questions on 8 pages. Attempt all questions.  The exam will be marked out of 100, out of a possibility of 100.

•  Inspera requires you to upload a single file containing your answers to all three questions. The file size cannot exceed 1 Gb.

•  The duration of the examination is two hours, between 17:00 (5:00PM) and 19:00 (7:00PM) on 15 June 2022, New Zealand Standard Time.

•  This assessment was designed to be completed within 2 hours by a pre- pared student.  However, you have 2 hours and 30 minutes in which to complete and submit it.

•  Submissions will be open until 19:30 NZST (7:30PM) to allow for scan- ning and uploading.  It is your responsibility to ensure your assessment is successfully submitted on time.

•  All reasonable forms of answer file format will be accepted,  including clearly scanned or photographed hand-written responses, PDF documents, Word or similar Libre Office documents, markdown files, etc.

•  This is an on-line open-book exam.  You are allowed any resource to answer the questions except consulting another person (see Academic Honesty Declaration). Piazza will be unavailable during the examination except to address private queries to the instructors.

•  Computing final numerical answers is not required. It is su代cient, for full marks, to produce a correct computable solution.

Question 1: [Total:  30 marks]

Data were collected from 696 randomly sampled women who gave birth over a 4 1/2 month period in a New Zealand hospital in 2011. The following table describes the data.

The outcome of interest is the diference between the date of birth and the expected date of delivery (‘DOB  -  EDD ‘), measured in weeks.  A moderate positive diference (late birth) is generally not an issue; a negative diference (early birth) larger in magnitude than 3 weeks identifies the baby as premature.

The exposure of interest is the Number of previous pregnancies without live birth (‘n  stillbirths‘) defined as Gravidity-Parity-1. (The “-1” dis- counts the current pregnancy).  This number includes stillbirths and voluntary interruptions of pregnancy.

Adjustments for ethnicity (binary variables eth EurOther,eth Maori,eth Pasifika and eth Asian), age group (AgeGrp), and presence of a husband or partner

(HusbPart) are considered sufficient to account for confounding in this obser- vational study.

a) [6 marks]  We fit two linear least-squares models, A and B.

Using the partial output supplied below, test the null hypothesis H0  : βGravidity  = -βParity  vs H1  : not H0 .

Justify your answer briely.

Model A:

Coefficients:

Estimate  Std .  Error  t  value  Pr(>| t | )

(Intercept)       0 .05010        0 .38813      0 .129    0 .89734

Gravidity         -0 .30532        0 .09951    -3 .068    0 .00224  **

Parity                 0 .30649        0 .12137      2 .525    0 .01178  *

[snip  -  you  don’t  need  the missing  output  to  answer  the  question]

(Dispersion  parameter  for  gaussian  family  taken  to  be  5 .159174)

Null  deviance:  3639 .1  on  695    degrees  of  freedom

Residual  deviance:  3523.7161  on  683    degrees  of  freedom

Model B:

Coefficients:

Estimate  Std .  Error  t  value  Pr(>| t | )

(Intercept)         -0 .25347        0 .35700    -0 .710    0 .47796

‘n  stillbirths‘  -0 .30509        0 .09877    -3 .089   0 .00209  **

[snip]

(Dispersion  parameter  for  gaussian  family  taken  to  be  5 .151635)

Null  deviance:  3639 .1    on  695    degrees  of  freedom

Residual  deviance:  3523.7181    on  684    degrees  of  freedom

b) [6 marks]  According to the fitted model below, how large would the num- ber of previous pregnancies with no live birth (the ‘n  stillbirths‘ co- variate) need to be for ‘DOB  -  EDD‘ to be more negative than -3 weeks on average, if the expectant mother is Asian with no other ethnicity, has no husband/partner and is over 40?  It is su代cient to set up the equation without solving it.

Model B (again)

Coefficients:

Estimate  Std .  Error  t  value  Pr(>| t | )

(Intercept)         -0 .25347        0 .35700    -0 .710    0 .47796

‘n  stillbirths‘  -0 .30509        0 .09877    -3 .089   0 .00209  **

eth_Maori             -0 .11255        0 .28660    -0 .393    0 .69465

eth_Pasifika       -0 .32473        0 .31575    -1 .028    0 .30411

eth_Asian             -0 .60485        0 .36708    -1 .648    0 .09987  .

eth_EurOther          0 .15025        0 .30503      0 .493    0 .62248

AgeGrp<20 -0 .17129 0 .32625 -0 .525 0 .59974

AgeGrp20  -  24        0 .07679        0 .25737      0 .298    0 .76552

AgeGrp30  -  34      -0 .05538        0 .25319    -0 .219    0 .82692

AgeGrp35  -  39      -0 .08779        0 .28121    -0 .312    0 .75499

AgeGrp40+              -0 .98586        0 .61525    -1 .602    0 .10953

HusbPartNo            -0 .24467        0 .22864    -1 .070    0 .28495

(Dispersion  parameter  for  gaussian  family  taken  to  be  5 .151635)

Null  deviance:  3639 .1    on  695    degrees  of  freedom

Residual  deviance:  3523.7    on  684    degrees  of  freedom

c) [6 marks] Explain in words and with simple notation how you could produce a Wald confidence interval for your answer in b) using a reliable method to estimate standard errors.

d) [6 marks]  We consider a model including the interaction terms between ‘n  stillbirths‘ and all ethnicity variables. A partial summary is shown below:

Model C

Coefficients:

Estimate  Std .  Error  t  value  Pr(>| t | )

(Intercept)                                -0 .35170        0 .40251    -0 .874       0 .383

‘n  stillbirths‘                           -0 .03248        0 .38871    -0 .084        0 .933

eth_Maori                                      -0 .20279       0 .33713    -0 .602        0 .548

eth_Pasifika                               -0 .09238        0 .36578    -0 .253       0 .801

eth_Asian                                      -0 .61988       0 .42384    -1 .463        0 .144

eth_EurOther                                   0 .31564        0 .36039      0 .876        0 .381

[snip]

‘n  stillbirths‘:eth_Maori          0 .14032       0 .34115     0 .411       0 .681

‘n  stillbirths‘:eth_Pasifika  -0 .46421        0 .39582   -1 .173       0 .241

‘n  stillbirths‘:eth_Asian          0 .09691       0 .48039     0 .202       0 .840

‘n  stillbirths‘:eth_EurOther  -0 .34506        0 .33578   -1 .028       0 .304 (Dispersion  parameter  for  gaussian  family  taken  to  be  5 .12995)

Null  deviance:  3639 .1    on  695    degrees  of  freedom

Residual  deviance:  3488 .4    on  680    degrees  of  freedom

Test the significance of the interaction term by producing an appropriate test statistic and p-value; make sure to specify the approximate distribu- tion of the test statistic under the null hypothesis of no interaction.

e) [6 marks]  The distribution of premature births by maternal age group is shown below:

Counts of premature births by age group, original data

Premature   <20 20 - 24 25 - 29 30 - 34 35 - 39 40+

FALSE    72      148       146       148       103      12

TRUE     8        14         16         16         10        3

We create a subsampled data set consisting of all premature births and a sample of twice that number of non-premature births, stratified by age. The distribution of prematurity and age group in the subsampled data is given below.

Counts of premature births by age group, subsampled data

Premature   <20 20 - 24 25 - 29 30 - 34 35 - 39 40+

FALSE    16       28         32         32         20        6

TRUE     8        14         16         16         10        3

Explain how to fit a relative risk model of prematurity to the subsampled data that will estimate the efect of ‘n  stillbirths‘ on Premature unbiasedly (still assuming that confounding is correctly addressed) and will produce reliable standard errors.

Question 2: [Total:  30 marks]

A model for Covid risk in arrivals at the border is given by

logitP [Yit = 1jbi] = αi + bi + β1Xi;t + β2Xi;t-1

where Yit  is the probability of testing positive within a week of arrival for an individual from country i during week t of the epidemic, Xi;t   is the incidence of diagnosed Covid cases per 100,000 people in country i  during week t, and Xi;t-1  is the incidence of diagnosed Covid cases per 100,000 people in country i during week t - 1. The model for the random efects bi is

bi ~ N(0, τ2 ).

a) [5 marks]   This model could be fitted with separate fixed efects √i  in- stead of the random efects b i . Explain the term ”shrinkage” and how it connects the values of bi i .

b) [10 marks]  Other predictor variables (incidence rates, test positivity rates, death rates, testing rates) are available. Describe away to choose a model that predicts accurately, using weekly data from each country.

c) [5 marks]  The values ofβ(^)1   and β(^)2  are approximately 1 and -0.5 respec-

tively. A non-statistician asks if the negative value of β(^)2  means that higher incidence leads to lower risk. How would you answer?

d) [10 marks]  If the predictors were transformed to Xi;t  and Xi;t  - Xi;t-1 , what would be the coe代cients of these two variables?  What would the interpretation of these coe代cients be?

Question 3: [Total:  40 marks]

Data collected in STATS 201 show that students who regularly attend lec- tures obtain higher average grades than those who do not regularly attend lectures.  One possible explanation is that lectures are useful; another is that lecture attendance is not actually useful, but is an efect of student interest in statistics, and that interest in statistics afects grades.

Using variables ATTEND for regular attendance, GRADE for grades, and INTEREST for interest in statistics, answer the following questions.

a) [5 marks]  Draw causal graphs that represent the two competing explana- tions for the correlation between attendance and grades.

b) [5 marks]  Write down a regression model where the coe代cient of ATTEND estimates the efect of lecture attendance on grades.

c) [10 marks]  Any realistic measurement of the variable INTEREST will not be a perfect representation of the true underlying variable.  Modify your causal graphs to show both the true underlying variable INTEREST and the measurement INTEREST*. Explain how this will afect the interpretation of your model in part b).

d) [10 marks]  Suppose a lecturer increases attendance by ofering an unre- lated incentive, such as handing out chocolate.  Under the two competing explanations, extend your causal graphs from part A to include a vari- able INCENTIVE, say whether the incentive would be expected to increase average grades, and explain why.

e) [10 marks]  Give conditions under which the incentive is an instrumental variable and describe how it allows you to estimate the efect of ATTEND on GRADE.




站长地图