辅导essay建模、辅导AI编程、辅导神经网络算法

- 首页 >> 其他


You are given some historical data describing the scores attained by a random sample of 50

Scottish school children in a reading test. You know that the original scores were recorded on

a continuous scale (and constrained to take positive values) but the values in the data set are

recorded only to within certain intervals. The data are shown below.

Table 1: Table of scores

Interval Number of scores

20 - 30 1

30 - 40 8

40 - 50 19

50 - 60 15

60 - 70 3

70 - 80 4

Denote these data as y. Numerous recent studies on other groups have suggested that the

distribution of scores in the test may be modelled by a Gamma(α, β) distribution with shape

parameter α = 20 and rate parameter β = 0.5. You wish to fit a Gamma distribution to

the (censored) scores in the table to investigate whether the historical data are consistent with

current beliefs. As you do not wish the recent data to prejudice your analysis you assign

non-informative, independent priors π(α) ∝ 1 and π(β) ∝ β

−1

.

1. Describe how you could use data augmentation and Markov chain Monte Carlo methods

to sample from π(α, β, x|y) where x ∈ R

50 denotes the precise scores of the 50 children

in the sample. You may wish to use a mixture of Gibbs and Metropolis methods in your

algorithm. [4]

2. Implement your algorithm in R and use it to investigate π(α, β|y). Comment on the

extent to which the historical data set confirms or contradicts the findings of the more

recent analyses. You should present both univariate and bivariate summaries of the

posterior distribution. Use your algorithm to estimate the posterior probability that:

• the highest score achieved is greater than 75;

• the lowest score achieved is less than 25.

[6]

1

3. Investigate whether your Markov chain mixes well and discuss features of the posterior

distribution that may impact on the mixing of the chain in this case. [3]

4. Discuss any assumptions that are made in your analysis. [2]

Your findings should be presented in the form of a short report, which should:

• have a clear and logical structure;

• include an introduction and clearly stated conclusions that can be understood by any

numerate scientist;

• include detail of your mathematical calculations so that your results could be reproduced

by another statistician;

• include clearly labelled and correctly referenced tables and diagrams, as appropriate;

• include the R code you used in an appendix (you do not need to explain individual

R commands but some comments should be included to indicate the purpose of each

section of code);

• include citation and referencing for any material (books, papers, websites etc) used.

Notes

• This assignment counts for 15% of the course assessment.

• You may have face-to-face discussions with me or your colleagues, but your report

must be your own work. Plagiarism is a serious academic offence and carries a range of

penalties, some very serious. Copying a friend’s report or code, or copying text into your

report from another source (such as a book or website) without citing and referencing

that source, is plagiarism. Collusion is also a serious academic offence. You must not

share a copy of your report (as a hard copy or in electronic form) or your computer code

with anyone else. Penalties for plagiarism or collusion can include voiding of your mark

for the course.

• Computer Labs will run at each campus, during which you may work on this assignment

and ask questions. To benefit most from these labs, please spend time working on

the assignment beforehand.

• Your report should be submitted through Turnitin by Friday, March 30, 17:00

(GMT). A link to the submission page is available through the ‘Assessment’ section

of the course Vision page. Please use the submission link appropriate for the

campus where you are studying (Edinburgh or Malaysia). For late submissions 2

marks will be deducted for each day (or part of a day) late. Submissions that are

more than 5 days late will receive 0 marks