代做PQHS 459 Midterm (2024 Spring)代做留学生SQL 程序
- 首页 >> OS编程PQHS 459 Midterm (2024 Spring)
Part 1: True or False (40 points)
For each of the following statements, simply indicate whether the answer is True or False. If you feel as though there is insufficient information provided, you may email the instructor for clarification. Note, you do not need to give a justification for your answer. Each question is worth 2 points, and there are no penalties for incorrect answers.
1.1 T or F: A longitudinal study is said to be balanced if, all subjects are measured for the same number of times.
1.2 Researchers want to study the effects of long-term smoking on individual health. They compared the rates of lung disease in long-term smokers (15+ years) to the rates of lung disease in an (otherwise similar) group of new smokers (<1 year), i.e., a demographically- matched control group. They estimated that long-term smokers were nearly twice as likely to have developed serious lung diseases. T or F: assuming the estimation was proper, this is an example of longitudinal effect.
1.3 You are working on a long-format longitudinal dataset of 100 subjects. It contains a column for their ID, a treatment indicator, in which week was the observation taken, and their outcomes. The dataset has been sorted by ID and the first several rows of this dataset are shown below. T or F: when translated into wide format, the data frame will have 100 rows and 5 columns.
1.4 T or F: If we fit a general linear model with only a continuous covariate and would like to test it (i.e., full model with this variable + intercept vs. reduced model with intercept only), we are unable to perform likelihood ratio tests.
1.5 T or F: When deciding working correlation structures for general linear models, we can test whether or not the exchangeable correlation structure is appropriate compared to unstructured correlation structure using likelihood ratio tests.
1.6 Suppose we have fit a general linear model, with a structure given by
where tj representing the time index (i.e., I(tj = 2) = 1 if it is time 2, and is 0 otherwise), Trti is a binary indicator for active treatment, and I(Agei > 45) = 1 for any individual who was over 45 years old when the study began.
T or F: we’d like to test whether the time trend associated with treatment is equivalent to the time trend associated with being over 45 years old (i.e., treatment and being over 45 years old modify the time trend to the same extent), we can perform a Wald test for H0 ∶ Lβ = 0 by letting
1.7 T or F: The REML estimators are obtained by optimizing a modified version of the log- likelihood, and cannot be used for likelihood ratio test.
1.8 T or F: We would like to make inference of the population-level effect on some normally distributed data, therefore we cannot use a linear mixed-effects model.
1.9 We are comparing two non-nested general linear models using AIC, and model 1 has AIC = 1911, model 2 has AIC = 1890. T or F: We therefore claim model 2 is statistically significantly better than model 1.
1.10 We are predicting individual outcomes from a linear mixed-effects model as a weighted ̂
average between the estimated population mean (xβ) and the individual observation. T or F: The population average receives more weight when the estimated residual variance (i.e., sampling error) is large.
1.11 We are comparing two general linear models, with the same mean structure and fitted using maximum likelihood. Model 1 was fitted using AR1 correlation structure and model 2 using compound symmetry. The log-likelihood for the first model is found to be −945 and the log-likelihood for the second model −960. T or F: Mr. Doe can calculate the LRT statistic
W = 2 ∗ (960 − 945) = 30 and compare it with X1(2) to get valid p-value.
1.12 In a linear mixed model, let D = (D11, D22, D12) denote the parameters corresponding to Var(b0i), Var(b1i), Cov(b0i, b1i), respectively. T or F: The test statistic of H0 ∶ D12 = 0 has a different null distribution from that of H0 ∶ D22 = 0.
1.13 Mr. Doe fits a new linear mixed-effects model with fixed effects: yij ∼ β0 +β1Tij +β2AiTij with random slope, where Ai = 1 if treated and 0 if in control, Tij is the measurement time for observation j and subject i. T or F: β1 is interpreted as the expected change in outcome per 1 unit increase in time for those in the control group, while β2 is the expected change in outcome per 1 unit increase in time in the treatment group.
1.14 T or F: Linear mixed-effects model with only random intercept and general linear model with compound symmetry correlation structure are equivalent.
1.15 Suppose we fit a general linear model
T or F: H0 ∶ β4 = β5 = 0 is equivalent to testing the longitudinal effects of age and exposure are equal to the cross-sectional effects of age and exposure, simultaneously.
1.16 An analyst fit the following model in R: model1 <- lmer(Outcome ~ Days + (Days | ID), dataset) and model2 <- lmer(Outcome ~ Days + (1 | ID), dataset). T or F: He can perform. a valid likelihood ratio test using anova(model1, model2).
1.17 Mr. Doe is working on a dataset where the age effect is nonlinear, where age is measured continuously ranging from 0-15. T or F: it is appropriate to directly include age2 in addition to age to capture the nonlinearity.
1.18 Consider two models in R: m1 <- gls(Outcome ~ Time + Age + Year, correlation = corAR1(form. = ~ 1 | ID), dataset) and m2 <- gls(Outcome ~ Time + Age * Year, correlation = corAR1(form. = ~ 1 | ID), dataset). Likelihood ratio test based on m1 and m2 returns p = 0.2. T or F: LRT is valid and suggests m1 is adequate.
1.19 Mr. Doe is working on a balanced longitudinal data and considers repeated measures ANOVA with treatment, time and their interactions. The only significant effect is time. T or F: he can then use t-tests for pairwise comparison to determine which time point has different outcome from others as post-hoc analysis (assuming multiple testing is accounted for).
1.20 Here is a toy model. T or F: the age effect in female subjects is (0.841450-0.365010), with a standard error of √(0.07102022 + 0.10465092).
summary(gls(distance ~ age * Sex, correlation = corAR1(form = ~ 1 | Subject), data=dta))
Generalized least squares fit by REML
Model: distance ~ age * Sex
Data: dta
AIC BIC logLik
822.927 842.5957 -405.4635
Correlation Structure: AR(1) Formula: ~1 | Subject
Parameter estimate(s):
Phi
0.5686973
Coefficients: |
|
|
|
Value |
Std.Error t-value p-value |
(Intercept) |
15.407969 |
0.8544723 18.032146 0.0000 |
age |
0.841450 |
0.0710202 11.848040 0.0000 |
SexFemale |
2.065691 |
1.2593105 1.640335 0.1025 |
age:SexFemale -0.365010 0.1046509 -3.487886 0.0006
Correlation:
(Intr) age SexFml
age
SexFemale
-0.914
-0.679
0.620
age:SexFemale 0.620 -0.679 -0.914
Standardized residuals: |
|
|
Min Q1 Med |
Q3 |
Max |
-2.3739621 -0.5936218 -0.1317226 |
0.6264212 |
2.5379656 |
Residual standard error: 2.164975
Degrees of freedom: 200 total; 196 residual
Part 2: Single Choice (30 points)
For each of the following questions, select the best one that applies. There will be only ONE correct choice in each question. Note, you do not need to give a justification for your answer. Each question is worth 3 points, and there are no penalties for incorrect answers. There are no partial credit, you need to select all correct answers to get full credits.
2.1 You are fitting a general linear model with unstructured working correlation. Which of the following mean models expresses a linear relationship in conditional expectation of Y and is NOT estimable with general linear models?
A. E[Y|X] = β0 + β1X
B. E[Y|X] = β0 + β1log(X)
C. E[Y|X] = β0 + β1 exp(X + α1)
D. E[Y|X] = exp(β0 + β1X)
2.2 Mr. Bean fit a general linear model using compound symmetry working correlation struc- ture. If the cor(Yi1, Yi2) = 0.5, what can we know about cor(Yi4, Yi7)?
A. 0.5
B. 0.125 C. 0.3
D. Not enough information.
2.3 We are fitting a general linear model for a large dataset and assume we get the correct mean model E(Y|X). However, we unfortunately picked the wrong covariance model. What can we know about the estimates?
A. The estimated regression coefficients are close to their true value.
B. The sampling distribution of the regression coefficients will follow a normal distribution with mean 0.
C. The standard errors of the regression parameters are smaller than those had we used the correct covariance model.
D. If the sample size goes to infinity, our model-based variance estimator can achieve its optimal efficiency (smallest sampling variance).
2.4 Which of following statements are correct regarding sandwich variance estimator in general linear models?
A. It is a safeguard against working covariance misspecification and should always be used. B. It provides valid variance estimate for regression coefficients when sample size is large.
C. It always gives smaller variance estimates than model-based/naive variance estimators. D. It is not model-based and cannot be used in hypothesis testing for regression coefficients.
2.5 We have a longitudinal data where every subject are measured multiple times. Consider a general linear model with “independence” correlation structure and constant variance. Which of the following statements are incorrect?
A. The correlation model is highly likely to be wrong.
B. We can obtain the regression coefficient estimates using lm() in R.
C. The standard errors estimated using generalized least squares are invalid.
D. This is a good approach as independence correlation eases computation, and we can fix the variance using robust variance estimator.
2.6 A student is having difficulties selecting working correlation model for a general linear model on a unbalanced longitudinal dataset where the observations were made continuously in time. Which of the following statements is true?
A. He can try popular structures including unstructured, AR1, compound symmetry, and then select the optimal one using AIC.
B. He can try popular structures including unstructured, AR1, compound symmetry, and then select the optimal one using likelihood ratio test.
C. He needs to make the dataset balanced first, because general linear model cannot work on unbalanced longitudinal data.
D. He does not need to start with the unstructured structure even though it makes the fewest assumptions.
2.7 Suppose we fit a linear mixed model E[y|x] = β0 + β1x1 + β2x2 + β3x3 + β4x4, and wish to test H0 ∶ β 1 = β2, β4 = 3, β3 = 2β2, and β0 denotes the intercept. Which of the following statements is true?
A. We cannot write the null hypothesis using matrix notation and thus cannot use Wald test. B. We can test the three equalities one by one, and rejects H0 if one of those three p-values is less than 0.05.
C. The test statistic should be compared with a X3(2) to get p-value.
D. We cannot use F-/t-test because the model output in R only includes t statistic for indi- vidual coefficient.
2.8 We are analyzing the longitudinal trend of blood biomarker, and time will only be con- sidered as linear effects (of course, it is continuous in this case). The interest is to make predictions over time for each subject. Student A used general linear model while student B chose linear mixed models. Suppose they both make predictions (biomarker levels over time) for some subject from this dataset (i.e., for each subject, there are two predicted curves pred__A and pred__B), which of the following are true?
A. The two curves pred A and pred B will coincide.
B. Observed curve will be closer to pred A. C. Observed curve will be closer to pred B. D. pred B and pred A will never cross.
2.9 Consider a randomized control trial. The outcome y is normally distributed, and is mea- sured repeatedly at and after baseline. Measurement time variable, time, is continuous, rang- ing from 0 to 20. The primary interest is to assess the longitudinal treatment effect on the outcome y. Indicator variable treatment = 1 if treated and 0 if in the control arm. Consider two general linear models in R using gls(): model 1 with model = y ~ treatment + time + treatment:time, while model 2 has model = y ~ time + treatment:time. Both models use the same working correlation structure. Which of the following are true?
A. Model 1 and 2 are equivalent.
B. Model 2 is wrong it does not estimate a treatment effect.
C. In principle, the standard error of the interaction term in model 1 is smaller than that in model 2. D. None of above.
2.10 We analyze a longitudinal dataset using both a linear mixed-effects (LMM) model and a general linear model (GenLinMod) with the same fixed-effects specification. Our primary interest is to assess the regression paramaters in E(y|x). Which of the following statements are true?
A. We can just fit LMM and the fixed effects estimates will be the same in GenLinMod.
B. We can just fit GenLinMod and the fixed effects estimates will be the same in LMM.
C. The fixed effects estimates will never be the same in LMM and GenLinMod.
D. We do not have enough information to know if the fixed effects estimates will be the same in the two models.
Part 3: Short Answer (30 + 5 points)
For each of the following questions, write a short answer justifying your response. The number of points for each problem is specified. Partial credits will be awarded for correct work with an incorrect final answer.
3.1 (10 points) Suppose that we fit the following general linear model
E[yij |xij] = β0+β1Timeij+β2Treatmenti+β3Treatmenti Timeij+β4Incomeij+β5Physical Activityij
3.1.1 (5 points) We hypothesize that the impact of physical activity on the outcome should be 5 times that of income. Besides, the the impact of treatment is not moderated by time. Write down the null hypothesis in the form of H0 ∶ Lβ = 0, and specify the corresponding null distribution of the test statistic. You can write down H0 using either a matrix L, or using notations for coefficients such as “beta1 + beta2 = 0 (this is a demo example only)“. The L matrix can be written using R code syntax, e.g., L <- rbind(c(0,0,0,0), c(0,0,0,0)), as we did in our R example slides.
3.1.2 (5 points) Using the fitted model, can you interpret the β2? Also what is the point estimate of treatment effect at Time=5? For this question, it is acceptable to use symbols: for example, use beta1, b1, etc., for β1.
3.2 (20 + 5 points) Consider a balanced, randomized control trial with 3 treatment arms: drug 1, drug 2 and drug 3. The outcome is continuous and is measured repeatedly at time 0, 1, 2, 3, 4. We assume that time is a continuous variable. We use two indicator variables for treatment groups: d2 for drug 2, d3 for drug 3. For example, d2 = 2 if subject receives drug
2, and 0 otherwise. We fit a linear mixed effects model in R using gls(y ~ time + time:d2 + time:d3, data = dataset, correlation = corExp(form. = ~ time|ID)). Then we get the output:
3.2.1 (5 points) What is the expected outcome for subjects at time 2 who receive drug 3?
3.2.2 (5 points) Under the assumed exponential working correlation, the estimated param- eter p = 0.762. What is the estimated correlation between measurements at time=2 and time=4?
3.2.3 (5 points) Mrs. Smith is considering an alternative general linear model with the same formula, but with a unstructured correlation structure and constant variance. How many parameters do we need to describe the covariance (correlation and variance)?
3.2.4 (5 points) Mr. Doe considers a different approach using linear mixed effects model: lmer(y ~ time + time:d1 + time:d2 + time:d3 + (time | id), dataset). We want to test whether the individual longitudinal trends are heterogeneous (Hint: which term(s) capture individual longitudinal trends, or slopes?), using an applicable likelihood ratio test. Can you write out the null hypothesis 0 and the corresponding null distribution?
3.2.5 (Bonus 5 points) Go back to the linear mixed-effects model described in 3.2 above. Please find partial output from the fitted model. What is the covariance between measurements at time=2 and time=3 for a subject who get drug 2?