STAT 3701讲解、辅导R编程语言、讲解Canvas、辅导R设计
- 首页 >> OS编程 STAT 3701 Homework 3
Show all work. Submit your solutions in a pdf document on Canvas. Include your R code (which must be
commented and properly indented) in the pdf file. Also submit one text file with all your R code (comments
and all) clearly labeled with the problem it goes with. This must be properly indented. Before every solution
with random sampling use set.seed(3701).
Question 1 (15 points).
Let’s consider a slightly different data generating model for paired data. Suppose we want to compare
students scores of a standardized test before and after taking a prep course. There are n students that will be
taking the test. Let X be the yet-to-be observed random variable of the score before the course and Y be the
yet-to-be observed random variable of the score after the course. Moreover, we let X and Y be the linear
combinations of two common latent variable, i.e., the model for the ith student is Xi = µ1 + a1Ai + b1Bi
Yi = µ2 + a2Ai + b2Bi,
where µj , aj , bj , j = 1, 2 are constants. Ai
, Bi are independent both within the same subject and across
different subjects. Ai’s are iid ∼ N(0, σ2A), Bi
is are iid N(0, σ2B).
(a) (4 points) What are the distributions of Xi and Yi
, respectively? If we let Zi = Xi −Yi
, does this data
generating model still satisfy the assumptions for paired t-test? Why?
(b) (5 points) Let µ1 = 68, µ2 = 70, σ2
A = 2, σ2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2. It can
be show that E[X] = 68, E[Y ] = 70. Use simulation to produce a 99% confidence interval for
Var(X) = E[(X−68)2
] and a 99% confidence interval for Var(Y ) = E[(Y −70)2
]. Set reps = 105
.
(c) (3 points) Now we are interested in paired t-test for
H0 : µ1 = µ2
Ha : µ1 ̸= µ2,
with data assumed to be coming from our data generating model. Create a function named mypaired.pval
that generate realizations of the random p-value of this test. Your function should have the following
arguments
• mu1, mean of X
• mu2, mean of Y
• a1,b1,a2,b2 as defined in our data generating model
• sigmaA, standard deviation of A
• sigmaAB, standard deviation of B
1
• n, sample size (number of pairs)
• reps, number of replications.
The function should output a vector containing the realizations of the random p-value.
(d) (4 points) Use µ1 = 68, µ2 = 68.5, σ2
A = 2, σ2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2, α = 0.05. Set
reps = 1000. Test your function by creating a simulation based estimation of the power curve with
the sample size n ∈ {20, 25, . . . , 200}. Produce a pointwise 95% confidence interval for the power
curve with the Clopper-Peason CI and include the curves for the lower bound and upper bound of CI
in the same plot of the estimated power curve.
Question 2 (15 points)
For two-independent-samples model, we assume that Xi’s are iid N(µ1, σ2
) for i = 1, . . . , n1, Yj ’s are iid
N(µ2, σ2
) for j = 1, . . . , n2 and all observations are independent. We provided the 100(1−α)% confidence
interval for µ1 − µ2 as
X¯ − Y¯ ± t1−α/2,n1+n2−2Sp
In this question, we are interested in simultaneous confidence intervals for both µ1 and µ2, i.e., instead of
a CI for µ1 − µ2, we want one CI [L1, U1] for µ1 and another CI [L2, U2] for µ2 such that [L1, U1] and
[L2, U2] should capture the corresponding means simultaneously with probability larger than 1 − α. Here
L1, L2, U1, U2 are some lower bounds and upper bounds calculated using the data and 1−α is our significance
level.
(a) (3 points) A very natural (but could be wrong potentially) idea would be
[L1, U1] = X¯ ± t1−α/2,n1−1
is the sample variance for {Xi} and S
is the sample variance for {Yi}. Calculate
the coverage probability for these simultaneous CIs, that is, calculate
(b) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability. Is 1 − α in this CI?
(c) (4 points) Now consider a Bonferroni corrected simultaneous confidence interval,
Show that this set of simultaneous CIs has a coverage probability larger than 1 − α. You may want
apply the Bonfferoni inequality P(A ∩ B) ≥ 1 − (1 − P(A)) − (1 − P(B)).
·2·
(d) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability.
Question 3 (20 points)
We have explored two sample procedures for the means, but in this problem we will write a function to do
these procedures for proportions. Consider two binomial random variables V and W. More specifically,
V ∼ Binom(n1, θ1) and W ∼ Binom(n2, θ2), where θ1, θ2 ∈ (0, 1). We are interested in testing
H0 : θ1 = θ2
Ha : θ1 ̸= θ2.
Let θb1 = V /n1, θb2 = W/n2 and θb = (V + W)/(n1 + n2). Then the test statistic for the test is
Under the null hypothesis, T ∼ N(0, 1) approximately.
(a) (5 points) Let θ1 = θ2 = 0.4, use simulation to evaluate the distribution of T for the cases
• n1 = n2 = 15
• n1 = n2 = 100
In each case, create a QQ-plot to compare the distribution of T with N(0, 1) and comment on the plot.
Set reps = 10000. You may use qnorm function when calculating the percentiles of the standard
normal distribution. Comment on the plots. You are not allowed to use rbinom to generate from
binomial distribution.
(b) (5 points) Besides the test, we can also provide an approximate 100(1 − α)% CI for θ1 − θ2 using the
formula
Create an R function myprop.test that generates the data, calculate the test statistic T and then
output the p-value of the test and the CI for θ1 −θ2. The function should take the following arguments
• n1, number of trials for V
• n2, number of trials for W
• theta1, success probability for V
• theta2, success probability for W
• alpha, significance level
• reps, number of replications
and the function outputs a list with three components:
·3·
• pval, vector of p-values of each replication
• upper, vector of upper bounds of CIs in each replication
• lower, vector of lower bounds of CIs in each replication
(c) (5 points) Test your function by estimating the coverage probability of the CI when θ1 = 0.6, θ2 =
0.4, n1 = n2 = 20, α = 0.05. Use reps=105
. Create a 95% conservative CI for the coverage
probability. Does it contain 1 − α?
(d) (5 points) Use your function to create a power curve when we increase the distance between θ1
and θ2. More specifically, let n1 = n2 = 100, θ2 = 0.52, α = 0.05 and θ1 takes values in
{0.52, 0.525, . . . , 0.680}. Set reps = 1000. Make a plot of the estimated power curve and also
include the lower and upper bounds of a 95% Clopper-Pearson confidence interval using dashed lines.
·4·
Show all work. Submit your solutions in a pdf document on Canvas. Include your R code (which must be
commented and properly indented) in the pdf file. Also submit one text file with all your R code (comments
and all) clearly labeled with the problem it goes with. This must be properly indented. Before every solution
with random sampling use set.seed(3701).
Question 1 (15 points).
Let’s consider a slightly different data generating model for paired data. Suppose we want to compare
students scores of a standardized test before and after taking a prep course. There are n students that will be
taking the test. Let X be the yet-to-be observed random variable of the score before the course and Y be the
yet-to-be observed random variable of the score after the course. Moreover, we let X and Y be the linear
combinations of two common latent variable, i.e., the model for the ith student is Xi = µ1 + a1Ai + b1Bi
Yi = µ2 + a2Ai + b2Bi,
where µj , aj , bj , j = 1, 2 are constants. Ai
, Bi are independent both within the same subject and across
different subjects. Ai’s are iid ∼ N(0, σ2A), Bi
is are iid N(0, σ2B).
(a) (4 points) What are the distributions of Xi and Yi
, respectively? If we let Zi = Xi −Yi
, does this data
generating model still satisfy the assumptions for paired t-test? Why?
(b) (5 points) Let µ1 = 68, µ2 = 70, σ2
A = 2, σ2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2. It can
be show that E[X] = 68, E[Y ] = 70. Use simulation to produce a 99% confidence interval for
Var(X) = E[(X−68)2
] and a 99% confidence interval for Var(Y ) = E[(Y −70)2
]. Set reps = 105
.
(c) (3 points) Now we are interested in paired t-test for
H0 : µ1 = µ2
Ha : µ1 ̸= µ2,
with data assumed to be coming from our data generating model. Create a function named mypaired.pval
that generate realizations of the random p-value of this test. Your function should have the following
arguments
• mu1, mean of X
• mu2, mean of Y
• a1,b1,a2,b2 as defined in our data generating model
• sigmaA, standard deviation of A
• sigmaAB, standard deviation of B
1
• n, sample size (number of pairs)
• reps, number of replications.
The function should output a vector containing the realizations of the random p-value.
(d) (4 points) Use µ1 = 68, µ2 = 68.5, σ2
A = 2, σ2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2, α = 0.05. Set
reps = 1000. Test your function by creating a simulation based estimation of the power curve with
the sample size n ∈ {20, 25, . . . , 200}. Produce a pointwise 95% confidence interval for the power
curve with the Clopper-Peason CI and include the curves for the lower bound and upper bound of CI
in the same plot of the estimated power curve.
Question 2 (15 points)
For two-independent-samples model, we assume that Xi’s are iid N(µ1, σ2
) for i = 1, . . . , n1, Yj ’s are iid
N(µ2, σ2
) for j = 1, . . . , n2 and all observations are independent. We provided the 100(1−α)% confidence
interval for µ1 − µ2 as
X¯ − Y¯ ± t1−α/2,n1+n2−2Sp
In this question, we are interested in simultaneous confidence intervals for both µ1 and µ2, i.e., instead of
a CI for µ1 − µ2, we want one CI [L1, U1] for µ1 and another CI [L2, U2] for µ2 such that [L1, U1] and
[L2, U2] should capture the corresponding means simultaneously with probability larger than 1 − α. Here
L1, L2, U1, U2 are some lower bounds and upper bounds calculated using the data and 1−α is our significance
level.
(a) (3 points) A very natural (but could be wrong potentially) idea would be
[L1, U1] = X¯ ± t1−α/2,n1−1
is the sample variance for {Xi} and S
is the sample variance for {Yi}. Calculate
the coverage probability for these simultaneous CIs, that is, calculate
(b) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability. Is 1 − α in this CI?
(c) (4 points) Now consider a Bonferroni corrected simultaneous confidence interval,
Show that this set of simultaneous CIs has a coverage probability larger than 1 − α. You may want
apply the Bonfferoni inequality P(A ∩ B) ≥ 1 − (1 − P(A)) − (1 − P(B)).
·2·
(d) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability.
Question 3 (20 points)
We have explored two sample procedures for the means, but in this problem we will write a function to do
these procedures for proportions. Consider two binomial random variables V and W. More specifically,
V ∼ Binom(n1, θ1) and W ∼ Binom(n2, θ2), where θ1, θ2 ∈ (0, 1). We are interested in testing
H0 : θ1 = θ2
Ha : θ1 ̸= θ2.
Let θb1 = V /n1, θb2 = W/n2 and θb = (V + W)/(n1 + n2). Then the test statistic for the test is
Under the null hypothesis, T ∼ N(0, 1) approximately.
(a) (5 points) Let θ1 = θ2 = 0.4, use simulation to evaluate the distribution of T for the cases
• n1 = n2 = 15
• n1 = n2 = 100
In each case, create a QQ-plot to compare the distribution of T with N(0, 1) and comment on the plot.
Set reps = 10000. You may use qnorm function when calculating the percentiles of the standard
normal distribution. Comment on the plots. You are not allowed to use rbinom to generate from
binomial distribution.
(b) (5 points) Besides the test, we can also provide an approximate 100(1 − α)% CI for θ1 − θ2 using the
formula
Create an R function myprop.test that generates the data, calculate the test statistic T and then
output the p-value of the test and the CI for θ1 −θ2. The function should take the following arguments
• n1, number of trials for V
• n2, number of trials for W
• theta1, success probability for V
• theta2, success probability for W
• alpha, significance level
• reps, number of replications
and the function outputs a list with three components:
·3·
• pval, vector of p-values of each replication
• upper, vector of upper bounds of CIs in each replication
• lower, vector of lower bounds of CIs in each replication
(c) (5 points) Test your function by estimating the coverage probability of the CI when θ1 = 0.6, θ2 =
0.4, n1 = n2 = 20, α = 0.05. Use reps=105
. Create a 95% conservative CI for the coverage
probability. Does it contain 1 − α?
(d) (5 points) Use your function to create a power curve when we increase the distance between θ1
and θ2. More specifically, let n1 = n2 = 100, θ2 = 0.52, α = 0.05 and θ1 takes values in
{0.52, 0.525, . . . , 0.680}. Set reps = 1000. Make a plot of the estimated power curve and also
include the lower and upper bounds of a 95% Clopper-Pearson confidence interval using dashed lines.
·4·