讲解CIS-PROB-STAT、辅导R、讲解product rule、辅导R编程语言

- 首页 >> Web
CIS-PROB-STAT 2019
Problem set #1
Due Tues July 23
Please strive to make your writing as clear and comprehensible as possible, so that whoever is reading your solution
(e.g. a TA or professor) will appreciate its beautiful clarity and will be able (even in a sleep-deprived state) to read
through smoothly once from beginning to end, nodding with approval and easy comprehension the whole time, without
pausing or backtracking or straining to think hard, ending with a complete understanding of your work. Any
calculations should be carried through to obtain explicit numerical answers. You should include just what is necessary
for a clear understanding of your results and how you obtained them; for example, include relevant R commands and
output but not complete unedited transcripts.
Good luck, and I hope you find this interesting and enjoyable! Don’t worry if you don’t know how to do any given
question or even how to get started; they are not supposed to be easy and different students have different amounts
of experience. The important thing for any of these problems is for you to spend at least a little time trying to work
on the problem, and then to get a hint or help if you need it; if you are starting to spend a long time without making
progress, please ask questions to each other, TA’s, and me! Also a problem may not have been stated in a clear way,
and questions of clarification could be really helpful for everyone. I can have some more office hours after classes.
1. [Monty Hall and Bayes] You are on a game show faced with 3 doors. Behind one of the doors is a car, and behind
the other two doors are goats; you prefer the car. Assume the position of the car is randomized to be equally likely
to be behind any door. You choose door #1. The host prolongs the drama by revealing that there is a goat behind
door #3. The host then asks you if you would like to switch your choice to door #2. Is it to your advantage to
switch? Your goal is to answer the host’s question by finding the conditional probability that the car is behind
door #2 given the relevant information. Actually as stated so far, not enough information is given to determine the
relevant probabilities, so consider the following two variations for the details of how the host decides which door to
open:
(a) This is the typical scenario assumed.* If the car is behind door #2, the host opens door #3; if the car is behind
door #3, he opens door #2; if the car is behind door #1, he opens door #2 or door #3 with probability ? each.
Suppose the host opens door #3. Should you switch to door #2?
(b) For this variation, suppose the host tosses a coin to choose which door (#2 or #3) to open, regardless of what is
behind those doors. Suppose it happens that door #3 is chosen, and it happens that you observe a goat behind
door #3. Should you switch to door #2?
[[Hint: This could be set up in different ways; I’ll try to describe one. To be definite about it, let’s not think of our
own choice to open door #1 as random; we know we will choose door #1. Now it’s like a frog about to take two
hops. The first hop determines the door where the car is hidden; we could call these 3 events C1, C2, and C3. These
3 events are assumed to have probability 1
3
each. From there, the second hop leads to the opening of a door, and we
are told that after two hops the frog ended up in a state where door #3 was opened and revealed a goat. Given that,
what is the probability that the frog passed through each of the 3 possible landing places for his first hop?]]
, If you find this question interesting, I think you will enjoy taking a look at this page.
2. Your favorite number is 3, and the idea of three 3’s makes you deliriously happy. You wonder if you roll a die 100
times, what is the probability of getting three consecutive 3’s at least once among those rolls. In this problem we’ll
see a way to calculate probabilities like this. So imagine you are rolling a die, repeatedly. We’ll say the first roll
occurs at “time 1,” the second roll at “time 2,” and so on,.
(a) Define T to be the first time at which we see any number other than 3. Write down the probability mass
function of T. (That is, find P{T = t} for t = 1,2,....)
*The idea here is that the host wants to open one door that is not the door you already chose, that is, he wants to open door 2 or 3. In this standard
variation of the game, the host knows where the car is, and he will not open that door. So, for example, if the car is behind door #2, then the only
choice the host has is to open door #3. The only case where the host has any choice is when the car is behind door #1, and in this scenario the host
tosses a coin to decide between opening door #2 or #3.CIS-PROB-STAT 2019 PS#1
(b) Let Ak denote the event that there is at least one run of three consecutive 3’s in k rolls of the die, and let ak
denote the probability P(Ak). Find an expression for ak
in terms of ak1, ak2, and ak3. [[Hint: Condition on
T. That is, use the Law of Total Probability?
to write as P(Ak) = ∑∞t=1 P{T = t}P(Ak
|T = t). Another example
of a helpful observation is that P(Ak
| T = 1) = ak?1.]]
(c) Use R and the relationship you found in part (b) to determine the value of a100.
(d) How many times would you need to roll the die in order to have probability greater than 0.5 of getting at least
one run of three consecutive 3’s?
3. Check part of the previous question by programing and running a simulation to approximate the probability of at
least one run of three consecutive 3’s in 100 rolls of a die.
Hints: To save you some thinking and programming effort, I’ve written a function that returns the
length of the longest run of consecutive 3’s in a vector. So hopefully given this function and what we have done so
far, it will not be hard for you to repeat (many times) the experiment of generating a vector of 100 random die rolls,
and for each such vector, simply run the function on that vector and record the resulting value. Then
you can approximate the desired probability by the fraction of repetitions giving a maxRunOf3s value of at least 3.
Here is an example to help you use maxRunOf3s (and understand how it works, if you want):
4. [[Margin of error of polls, using R]] We are familiar with political polls that find a certain percentage of people
favoring a given candidate, with the poll claiming a certain “margin of error.” Suppose we take a random sample of
size n from the population and find that the fraction in the sample who favor the given candidate is 0.56. Letting ?
denote the unknown fraction of the population who favor the candidate, and letting X denote the number of people
in our sample who favor the candidate, we are imagining that we have just observed X = 0.56n (so the observed
sample fraction is 0.56). Our assumed probability model is X ~ B(n,?). Suppose our prior distribution for ? is
uniform on the set {0, 0.001, .002,..., 0.999, 1}.
This is an important result and we saw it in class but may not have emphasized it as much as it deserves. E.g. you can find it in the notes on the top
of page 24 and, for the Bayesian frog question, near the bottom of page 31.
2CIS-PROB-STAT 2019 PS#1
(a) Use R to graph the posterior distribution in three cases: when n = 100, n = 400, and n = 1600.
(b) For each of the three cases, find the posterior probability P{ > 0.5 | X}
(c) For each of the three cases, find an interval of values that contains just over 95% of the posterior probability.
[You may find the function useful for this.] For each case, calculate the margin of error (defined to be
half the width of the interval, that is, the “±” value) and describe how the margin of error seems to depend on
the sample size (something like, when the sample size goes up by a factor of 4, the margin of error goes or down?> by a factor of about ).
[[A numerical tip: if you are looking in the notes, you might be led to try to use an expression like, for example,for the likelihood. But this can lead to numerical “underflow” problems because
the answers get so small. The problem can be alleviated by using the function instead for the likelihood (as
we did in class and in the R script), because that incorporates a large combinatorial proportionality factor, such as

1600
896 
that makes the numbers come out to be probabilities that are not so tiny. For example, as a replacement for
the expression above, you would use.]]
5. [[Examples of maximum likelihood estimators]] For data that comes from a discrete distribution, the likelihood
function is the probability of the data as a function of the unknown parameter. For data that comes from a continuous
distribution, the likelihood function is the probability density function evaluated at the data, as a function of
the unknown parameter, and the maximum likelihood estimator (MLE) is the parameter value that maximizes the
likelihood function. For both of the questions below, write down the likelihood function and find the maximum
likelihood estimator, including a justification that you have found the maximum (this involves something beyond
finding a place where a derivative is 0).
(a) If X ~ Bin(n,), write down the likelihood function and show that the MLE for is Xn.
(b) The exponential distribution with parameter λ (denoted by Exp(λ)) is a continuous distribution having pdf.
Suppose T1,T2,...,Tn are independent random variables with Ti ~ Exp(λ) for all i. Define S = T1 +T2 +···+
Tn. Write down the likelihood function, and show that the MLE for λ is n
S
, the reciprocal of the average of the
Ti’s.
[[To start thinking about part (a) it may help to remember the class when we were doing inference about ? in a
poll of size n = 100 with the observed data X = 56. For that example we calculated and plotted the likelihoods
for = 0,.001,.002,...,.998,.999,1, and it looked like the value that gave the highest likelihood was 0.56. Well,
in that example. Here we are thinking of the likelihood as a function of the continuous variable ?
over the interval [0,1] and showing mathematically that =
X
n maximizes the likelihood. So start by writing down
the likelihood function, that is, writing the binomial probability for getting X successes in n independent trials each
having success probability . Think of this as a function of (in any given example, n and X will be fixed numbers,
like 100 and 56), and use calculus to find the that maximizes this function. You should get the answer
Just as a hint about doing the maximization, you could maximize the likelihood itself, or equivalently you could
maximize the log likelihood (which you may find slightly simpler). If you do not want to take logs, you can still do
the required differentiation using the product rule.]]
3

站长地图