代做STAT2004/2904/7004 2024 – Assignment 4代写Processing

- 首页 >> Algorithm 算法

STAT2004/2904/7004 2024 – Assignment 4

Due date: 25 October 2024 at 16:00

STAT2004/7004: Complete Exercises 1–4 for a maximum of 40 marks and a total of 10%.

STAT2904: Complete Exercises 1–5 for a maximum of 50 (+5 bonus) marks and a total of 10% (+1% bonus).

All students: Exercise 6 is a bonus question. A complete and correct solution of this question earns you an extra 1% for this assignment.

Note that some questions involve interpretation and communication of results in the form. of an audio recording which you upload onto Blackboard as an audio file.

Reminder: while discussion of the Assignment questions (amongst yourselves, with lecturers and/or tutors) is encouraged, the final write-up must be your own. If you cannot express a solution in your own words, then you must cite your source(s).

Question 1 (Testing exponential rates) (8 marks)

Let X1, X2, . . . , X7 be a random sample from an exponential distribution with pdf

fX(x) = λe−λx , x ≥ 0 ,

and Y1, Y2, . . . , Y8 be another independent sample from an exponential distribution with pdf

fY (y) = θe−θy , y ≥ 0 .

Here, λ > 0 and θ > 0 are both unknown parameters. We want to test the null hypothesis H0 : λ = θ versus the alternative hypothesis H1 : λ = θ.

(a) (2 marks) Show that under the null hypothesis, the maximum likelihood estimator of λ = θ is given by

(b) (1 mark) Show that under the alternative hypothesis, the maximum likelihood estima-tors of λ and θ are given respectively by

(c) (3 marks) Construct a generalised likelihood ratio test for testing H0 : λ = θ versus H1 : λ ≠ θ, and show that it reduces to a test based on large or small values of the test statistic

It is given to you that T(X, Y) ∼ Beta(7, 8) under the null hypothesis H0 : λ = θ.

(d) (1 mark) Explain how you would set critical value(s) for your test from part (c) to control the Type I error at α = 5%, and write down your decision rule explicitly using these critical value(s).

(e) (1 mark) [Audio question]: Is your test from parts (c)–(d) uniformly most powerful for testing H0 : λ = θ versus H1 : λ ≠ θ at the 5% significance level? Briefly explain why, or why not.

Question 2 (Comparing ratings across groups) (8 marks)

A recent poll asked social media users to provide their opinions on a decision by a popular photo-sharing app to remove the number of “likes” from their posts. Each respondent was asked to express their opinion on the following five-point scale:

1 = Strongly disagree

2 = Disagree

3 = Neutral

4 = Agree

5 = Strongly Agree

Of the n = 198 respondents, 98 were “influencers” (with over 10,000 followers each) while the other 100 were regular users. The full dataset can be downloaded as a .csv file from Blackboard > Assessment > Assignment 4 > likes.csv.

(a) (1 mark) Visualise the data using an appropriate graph(s).

(b) (3 marks) Do the two types of users exhibit differing opinions regarding the recent changes to the photo-sharing app? Answer this question by carrying out an appropri-ate hypothesis test. Clearly state the null and alternative hypotheses, propose a test statistic, compute and interpret a p-value, and write your conclusions in a way that is understandable to a social scientist.

(c) (2 mark) State and critically assess any assumption(s) you made in answering (b).

(d) (2 marks) [Audio question]: A social scientist suggests comparing the two groups using a two-sample t-test applied directly to the five-point responses. Explain to her why this is inappropriate here.

Question 3 (Tuberculosis and blood type) (14 marks)

Overfield and Klauber (1980) published the following data on the incidence of tuberculosis in relation to ABO blood groups in a sample of Eskimos:

We want to investigate whether tuberculosis incidence is related to blood type.

Let pij denote the underlying proportion of the population with tuberculosis severity i ∈ {moderate/advanced, minimal, not present} and blood type j ∈ {O, A, AB, B}. For con-venience, write p = (pij ) for the 3 × 4 vector of proportions.

(a) (1 mark) Write down the null and alternative hypotheses in words.

(b) (1 mark) Write down the likelihood function for p given the observed counts x.

Under the null hypothesis, pij = pi• ×p•j for each i and j, where pi• is the overall proportion with tuberculosis severity level i and p•j is the overall proportion with blood type j.

(c) (3 marks) Show that under the null model the ML estimates of each ˆpi• and ˆpj• are given, respectively, by

pˆi• = xi•/n      and     ˆp•j = x•j/n ,

where xi• is the observed number of cases of tuberculosis severity i, x•j is the observed number of cases of blood type j, and n is the total sample size.

(d) (1 mark) Using the results from part (c), or otherwise, what counts would we expect to see in each cell of the table if the null hypothesis is indeed true?

Under the alternative hypothesis, there are no restrictions on the cell proportions pij (except that they must all sum to 1).

(e) (1 mark) State the ML estimates ˆpij of each cell proportion pij under the alternative. (You do not have to prove that these are the MLEs).

(f) (2 marks) Using your results from parts (b), (c) and (e), or otherwise, numerically evaluate the generalized likelihood ratio test statistic,

for testing the association between tuberculosis and blood type based on the observed counts in the table above. Also, numerically compute the transformation −2 log Λ.

(g) (1 mark) Using your results from part (d), or otherwise, compute Pearson’s χ 2 statistic,

Is Pearson’s χ 2 statistic numerically close to the −2 log Λ statistic from part (f)?

(h) (2 marks) Carry out the hypothesis test by computing and interpreting a p-value, and state your conclusion in a way that is understandable to a population health scientist.

Notice that one of the cells in the table contains only 3 counts. This may render the asymp-totic χ 2 distribution inaccurate for part (h). Instead, we can consider Fisher’s exact test.

(i) (2 marks) Using an alternative approach, or otherwise, re-do the analysis to account for the low counts in some of the cells. Does your conclusion from part (h) change?

Question 4 (Weight gain in pigs) (10 marks)

A trial was conducted in Iowa, USA, examining the effects of vitamin B12 dietary supplements and antibiotics on weight gain in pigs. Twelve adult pigs were randomly divided into four groups (one using standard pig chow, one using pig chow with added vitamin B12, one using pig chow with added antibiotics, and one using pig chow with both added vitamin B12 and antibiotics). After one week of feeding, the pigs were weighed and their weight gain (in grams) was recorded. The data are plotted below:

We can model the weight gains {Yjki} using a two-way ANOVA with interactions:

Yjki = µ + αj + βk + δjk + ϵjki ,

where j = 1, 2 denotes the level of factor A (antibiotics), k = 1, 2 denotes the level of factor B (vitamin B12), and i = 1, 2, 3 indexes the observations in each group. Assume that the errors ϵjki iid∼ N(0, σ2 ) across all j, k and i. The common variance σ 2 is taken to be unknown.

If we parametrize this model using the contrast constraints,

α1 = 0, β1 = 0        and     δ1k = δj1 = 0 for j, k = 1, 2,

then be interpreted as the mean of the baseline group with no antibiotics and no vitamin B12, α2 is the mean change from adding antibiotics only, β2 is the mean change from adding vitamin B12 only, and the interaction δ22 is additional mean change from adding both antibiotics and vitamin B12 simultaneously.

(a) (3 marks) Show that under the sum constraints the MLE of each parameter is given by

(b) (2 marks) Show that the following sum-of-squares decomposition holds:

SSTotal = SSA + SSB + SSAB + SSresidual ,

where

Hint: Start with the following identity:

(c) (1 mark) Briefly explain why the residual sum-of-squares has distribution given by

where dfresidual = JK(r−1) = 8. [Here, J = 2 is the number of levels of factor A, K = 2 is the number of levels of factor B, and r = 3 is the number of replications in each group.]

(d) (1 mark) Briefly argue why the residual sum-of-squares SSresidual is independent of the interaction sum-of-squares SSAB.

Using similar calculations to part (c), it also can be shown that under the null hypothesis H0 : all interactions δjk = 0, the interaction sum-of-squares has distribution given by

where dfAB = (J − 1)(K − 1) = 1.

(e) (1 mark) Using parts (c), (d) and the above result, or otherwise, argue why the null distribution of the so-called F-ratio,

is an F distribution with numerator degrees-of-freedom dfAB and denominator degrees-of-freedom dfresidual.

A partially-complete two-way ANOVA table for the pigs weight dataset is given below:

(e) (2 marks) Using your results from parts (b) and (e), or otherwise, complete the above ANOVA table. Hence, summarise the main finding(s) of this experiment and write a short conclusion.

Question 5 (STAT2904 only) (10 marks)

Let X1, X2, . . . , Xn be iid random variables from a Pareto distribution with pdf

where θ, ν > 0 are two unknown parameters.

(a) (4 marks) Find the MLEs for θ and ν

(b) (1 mark) If it is given to you that θ = 1, does that change the MLE for ν?

(c) (5 marks) Using parts (a) and (b), or otherwise, construct a generalized likelihood ratio test (GLRT) for testing

H0 : θ = 1, ν unknown                 versus             H1 : θ = 1, ν unknown,

and show that it reduces to a test based on either small or large values of the statistic T(X) given by

To finish specifying this test, we need to set the critical values for T(X) that determine what is “too small” or “too large”. However, the distribution of T(X) is too difficult to derive analytically. Instead, we can use simulations to help us find these critical values.

STAT2904 Bonus Questions (5 marks):

(d) (2 marks) For a sample size of n = 22, say, simulate one set of observations x1, x2, . . . , x22 from the Pareto distribution with θ = 1 and ν = 2.1. From this realisation, compute the value of the observed test statistic

(e) (1 mark) Repeat the simulation setting from part (d) 10,000 times, each time computing and saving the observed test statistic T(x)

(f) (1 mark) Estimate the upper and lower 2.5%-tiles of the distribution of T(X) using the simulated values from part (e).

(g) (1 mark) Investigate numerically how the cutoff values from part (f) changes if you set the nuisance parameter ν to another value (e.g., try ν = 1.3, 2.7, 3.4, etc...)

Question 6 (Bonus question for all students) (4 marks)

Let Y1 and Y2 be two random samples from a Uniform(λ, λ + 1) distribution. To test the hypothesis H0 : λ = 0 versus H1 : λ > 0, two competing tests are proposed:

• Geoff’s Test: reject H0 in favour of H1 if Y2 ≥ 0.95.

• Alan’s Test: reject H0 in favour of H1 if Y1 + Y2 ≥ c for some critical value c.

(a) Find the value of c such that Alan’s Test has the same significance level as Geoff’s Test.

(b) Prove or disprove: Alan’s Test is more powerful than Geoff’s Test.

(c) Construct a test with the same significance level but is more powerful than both Alan’s and Geoff’s Test.





站长地图