MATH3871辅导、R编程语言辅导
- 首页 >> OS编程 MATH3871/MATH5960
Assignment 1
Assignment 1
This assignment covers material in Lectures 1–3. Assignment is worth 15% of final course
grade. 5% of grade will be allocated for neat and concise presentation. Please refer to the
following instructions:
Assignment to be submitted via Moodle by 7 October 11:55PM AEDT
Include in your assignment, any relevant R code, R output, and mathematical derivations.
Embed the code and plots into your assignment (please don’t attach R markdown or other
R script files)
The total number of submitted pages should not exceed 6 A4 pages. Any pages submitted
in excess of 6 pages will not be graded.
Print, sign and attach this cover sheet with your assignment (not included in page count).
Refer to course handout for grading of late submissions
Plagiarism Statement
I declare that this assessment item is my own work, except where acknowledged, and has not
been submitted for academic credit elsewhere. I acknowledge that the assessor of this item may,
for the purpose of assessing this item reproduce this assessment item and provide a copy to
another member of UNSW; and/or communicate a copy of this assessment item to a plagiarism
checking service (which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking).
I certify that I have read and understood UNSW Rules in respect of Student Academic Mis-
conduct.
Name (print clearly):
Student Number:
Signature:
Date:
1
1. Inference: Let θ be the true proportion of people over the age of 40 in your community
with hypertension. Consider the following thought experiment:
(a) Though you may have little or no expertise in this area, give an initial point estimate
of θ.
(b) Now suppose a survey to estimate θ is established in your community, and of the first
5 randomly selected people, 4 are hypertensive. How does this information affect
your initial estimate of θ?
(c) Finally, suppose that at the survey’s completion, 400 of 1000 people have emerged
as hypertensive. Now what is your estimate of θ?
2. Multivariate Priors: Let x1, . . . , xn ∈ Rd be n iid d-dimensional vectors. Suppose that
we wish to model xi ~ Nd(μ,Σ) for i = 1, . . . , n where μ ∈ R is an unknown mean vector,
and Σ is a known positive semi-definite covariance matrix.
(a) Adopting the conjugate prior μ ~ Nd(μ0,Σ0) show that the resulting posterior dis-
tribution for μ|x1, . . . , xn is Nd(μ?, Σ?) where
μ? = (Σ?10 + nΣ
?1)?1(Σ?10 μ0 + nΣ
?1xˉ)
and
Σ? = (Σ?10 + nΣ
?1)?1.
(b) Derive Jeffreys’ prior piJ(μ) for μ.
Hint: If you need help with vector differentiation, you can find out about this on various
places on the internet. One such place is https://en.wikipedia.org/wiki/Matrix calculus.
3. Importance Sampling: There are many ways to compute or estimate pi. A very sim-
ple estimation procedure is via importance sampling. Suppose that samples x1, . . . , xn
were obtained uniformly inside a square with side length 2r (see diagram), where each
xi = (x
(1)
i , x
(2)
i ) for i = 1, . . . , n.
r
2
Now define bi = 1 if xi is also inside the circle of radius r, and bi = 0 otherwise. Then
p? = 1
n
∑n
i=1 bi is an estimate of the ratio of the area of the circle to the area of the square.
Given that we know the true value of p for this setting, we can then obtain an estimate
of pi.
(a) Show that the estimate of pi is given by 4p?.
(b) Estimate pi using n = 1, 000 samples.
(c) Using the central limit theorem, determine the Monte Carlo sampling variability of
p?i (i.e. derive the asymptotic distribution of p?i as n gets large).
(d) Construct a histogram of 1, 000 estimates of p?i, each based on n = 1, 000 samples.
Superimpose the Monte Carlo sampling variability distribution from part (c) under
the assumption that the true value for p=0.7854, and verify that it matches the
experimental result.
(e) Without using the true value of p, based on the Monte Carlo sampling variability,
determine what sample size, n, is needed if we require to estimate pi to within 0.01
with at least 95% probability.
(Hint: You will need to use a value for p in order to obtain this value. Choose the
value of p that gives the most conservative value of n, so that you can be sure that
you have estimated pi to the desired accuracy.)
Assignment 1
Assignment 1
This assignment covers material in Lectures 1–3. Assignment is worth 15% of final course
grade. 5% of grade will be allocated for neat and concise presentation. Please refer to the
following instructions:
Assignment to be submitted via Moodle by 7 October 11:55PM AEDT
Include in your assignment, any relevant R code, R output, and mathematical derivations.
Embed the code and plots into your assignment (please don’t attach R markdown or other
R script files)
The total number of submitted pages should not exceed 6 A4 pages. Any pages submitted
in excess of 6 pages will not be graded.
Print, sign and attach this cover sheet with your assignment (not included in page count).
Refer to course handout for grading of late submissions
Plagiarism Statement
I declare that this assessment item is my own work, except where acknowledged, and has not
been submitted for academic credit elsewhere. I acknowledge that the assessor of this item may,
for the purpose of assessing this item reproduce this assessment item and provide a copy to
another member of UNSW; and/or communicate a copy of this assessment item to a plagiarism
checking service (which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking).
I certify that I have read and understood UNSW Rules in respect of Student Academic Mis-
conduct.
Name (print clearly):
Student Number:
Signature:
Date:
1
1. Inference: Let θ be the true proportion of people over the age of 40 in your community
with hypertension. Consider the following thought experiment:
(a) Though you may have little or no expertise in this area, give an initial point estimate
of θ.
(b) Now suppose a survey to estimate θ is established in your community, and of the first
5 randomly selected people, 4 are hypertensive. How does this information affect
your initial estimate of θ?
(c) Finally, suppose that at the survey’s completion, 400 of 1000 people have emerged
as hypertensive. Now what is your estimate of θ?
2. Multivariate Priors: Let x1, . . . , xn ∈ Rd be n iid d-dimensional vectors. Suppose that
we wish to model xi ~ Nd(μ,Σ) for i = 1, . . . , n where μ ∈ R is an unknown mean vector,
and Σ is a known positive semi-definite covariance matrix.
(a) Adopting the conjugate prior μ ~ Nd(μ0,Σ0) show that the resulting posterior dis-
tribution for μ|x1, . . . , xn is Nd(μ?, Σ?) where
μ? = (Σ?10 + nΣ
?1)?1(Σ?10 μ0 + nΣ
?1xˉ)
and
Σ? = (Σ?10 + nΣ
?1)?1.
(b) Derive Jeffreys’ prior piJ(μ) for μ.
Hint: If you need help with vector differentiation, you can find out about this on various
places on the internet. One such place is https://en.wikipedia.org/wiki/Matrix calculus.
3. Importance Sampling: There are many ways to compute or estimate pi. A very sim-
ple estimation procedure is via importance sampling. Suppose that samples x1, . . . , xn
were obtained uniformly inside a square with side length 2r (see diagram), where each
xi = (x
(1)
i , x
(2)
i ) for i = 1, . . . , n.
r
2
Now define bi = 1 if xi is also inside the circle of radius r, and bi = 0 otherwise. Then
p? = 1
n
∑n
i=1 bi is an estimate of the ratio of the area of the circle to the area of the square.
Given that we know the true value of p for this setting, we can then obtain an estimate
of pi.
(a) Show that the estimate of pi is given by 4p?.
(b) Estimate pi using n = 1, 000 samples.
(c) Using the central limit theorem, determine the Monte Carlo sampling variability of
p?i (i.e. derive the asymptotic distribution of p?i as n gets large).
(d) Construct a histogram of 1, 000 estimates of p?i, each based on n = 1, 000 samples.
Superimpose the Monte Carlo sampling variability distribution from part (c) under
the assumption that the true value for p=0.7854, and verify that it matches the
experimental result.
(e) Without using the true value of p, based on the Monte Carlo sampling variability,
determine what sample size, n, is needed if we require to estimate pi to within 0.01
with at least 95% probability.
(Hint: You will need to use a value for p in order to obtain this value. Choose the
value of p that gives the most conservative value of n, so that you can be sure that
you have estimated pi to the desired accuracy.)