MAT 4378讲解、辅导categorical data、讲解R编程设计、辅导R 辅导Python编程|辅导R语言程序
- 首页 >> Database MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1
MAT 4378 – MAT 5317, Analysis of categorical data
Assignment 3
Due date: in class on Monday, November 18, 2019
Remark: You can use R for your computations for Questions 2 to 4. If you use
R please provide the output. However, the R output is not an answer to a question.
Please provide one or two sentences to properly answer the question.
1. Consider a ratio estimator h(ˆθ1,ˆθ2) = ˆθ1/ˆθ2, where the estimated variancecovariance
2. A carefully controlled experiment was conducted to study the effect of the size of
the deposit level on the likelihood that a returnable one-liter soft drink bottle
will be returned. The data to follow show the number of bottles that were
returned (Wi) out of 500 sold (ni) at each of size deposit levels (Xi
in cents):
Deposit level xi 2 5 10 20 25 30
Number sold ni 500 500 500 500 500 500
Number returned wi 72 103 170 296 406 449
An analysist believes that a logistic regression model is appropriate for studying
the relation between the size of the deposit and the probability a bottle will be
returned.
(a) Find the maximum likelihood estimates for β0 and β1. Give the estimated
regression model.
(b) Obtain a scatter plot of the sample proportions against the level of the
deposit, and superimpose the estimated logistic response onto the plot.
Does the fitted logistic response function appear to fit well?
(c) Obtain exp(βˆ
1) and interpret this number.
(d) What is the estimated probability that a bottle will be returned when the
deposit is 15 cents?
(e) Estimate the amount of deposit for which 75% of the bottles are expected
to be returned.
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 2
(f) In part (e), we have an estimate ˆx = g(βˆ
0, βˆ
1) for the level of the deposit
that corresponds to π = 75% of the bottles are returned. This estimator is
a non-linear function of βˆ
0, βˆ
1. Use the delta-method to find an asymptotic
estimated standard error for this estimate. Hint: It will be helpful to
use the function vcov on your glm object. Furthermore, to multiply the
matrices A and B with R use A %*% B.
3. A marketing research firm was engaged by an automobile manufacturer to conduct
a pilot study to examine the feasibility of using logistic regression for
ascertaining the likelihood that a family will purchase a new car during the
next year. A random sample of 33 suburban families was selected. Data on
annual family income (x1, in thousands of dollars) and the current age of the
oldest family automobile (x2, in years) were obtained. A followup interview
conducted 12 months later was used to determine whether the family actually
purchased a new car (y = 1) or did not purchase a new car (y = 0) during the
year. The data is found in the file CarPurchase.csv.
(a) Find the maximum likelihood estimates of β0, β1, and β2. State the estimated
logistic regression model.
(b) Obtain exp(βˆ1) and exp(βˆ2) and interpret these numbers.
(c) What is the estimated probability that a family with annual income of $50
thousand and an oldest car of 3 years will purchase a new car next year?
4. Rather than finding the probability of success at an explanatory variable value,
it is often of interest to find the value of an explanatory variable given a desired
probability of success. This is referred to as inverse prediction. One application
of inverse prediction involves finding the amount of pesticide or herbicide needed
to have a desired kill rate when applied to pests or plants. The lethal dose level
xπ (commonly called “LDz”, where z = 100 π is defined as
xπ =(cloglog(π) − β0)β1
for the complementary log-log regression model
cloglog(π) = β0 + β1 x.
(a) Show how xπ is derived by solving for x in the complementary log-log
regression model.
(b) We can obtain 95% confidence interval for xπ as follows:
Describe how this confidence interval for xπ is derived. (Note that there is
generally no closed-form solution for the confidence interval limits, which
leads to the use of iterative numerical procedures.)
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 3
(c) Turner et al. (1992) uses logistic regression to estimate the rate at which
picloram, a herbicide, kills tall larkspur, a weed. Their data was collected
by applying four different levels of picloram to separate plots, and the
number of weeds killed out of the number of weeds within the plot was
recorded. The data are in the file picloram.csv. Complete the following:
(i) We will use a cloglog model instead of a logistic regression model. Give
the estimated complementary log-log model.
(ii) Compute eβˆ1 and interpret this number within the context of the problem.
(iii) Plot the observed proportion of killed weeds and the estimated model.
Describe how well the model fits the data.
Note: Here are some commands that you might find helpful. We are
assuming that the dataframe is called picloram.data and that the
fitted model is called mod.
## plot proportions versus x
with(picloram.data, plot(x = picloram, y = kill/total,
xlab = "Picloram", ylab = "Proportion of weeds killed",
panel.first = grid(col = "gray", lty = "dotted")))
# Put estimated esimated response on the plot
curve(expr = predict(object = mod,
newdata = data.frame(picloram = x), type = "response"),
col = "red", add = TRUE)
(iv) Estimate the 0.9 kill rate level “LD90” for picloram. Add lines to the
plot in (iii) to illustrate how it is found (the segments() function can
be useful for this purpose).
(v) We are assuming that your fitted model is the glm object mod. Use
the following commands to compute a 95% confidence interval for the
0.9 kill rate. Note: The function uniroot solves for the root of a
function over an interval.
b0 = summary(mod)$coefficients[1,1]
b1 = summary(mod)$coefficients[2,1]
LD.x<-(log(-log(1-0.9))-b0)/b1
root.func <- function(x, mod.obj, pi0, alpha) {
beta.hat <- mod.obj$coefficients
cov.mat <- vcov(mod.obj)
var.den <- cov.mat[1,1] + x^2*cov.mat[2,2] +
2*x*cov.mat[1,2]
abs(beta.hat[1] + beta.hat[2]*x - log(-log(1-pi0)))/
sqrt(var.den) - qnorm(1-alpha/2) }
lower <- uniroot(f = root.func, interval =
c(min(picloram.data$picloram), LD.x),
mod.obj = mod, pi0 = 0.9, alpha = 0.05)
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 4
upper <- uniroot(f = root.func, interval =
c(LD.x, max(picloram.data$picloram)),
mod.obj = mod, pi0 = 0.9, alpha = 0.05)
lower$root
upper$root
(vi) In part (v), we found a 95% CI for x0.9. Explain in a few sentences
how these commands give us the lower and the upper bound of the
confidence interval.
MAT 4378 – MAT 5317, Analysis of categorical data
Assignment 3
Due date: in class on Monday, November 18, 2019
Remark: You can use R for your computations for Questions 2 to 4. If you use
R please provide the output. However, the R output is not an answer to a question.
Please provide one or two sentences to properly answer the question.
1. Consider a ratio estimator h(ˆθ1,ˆθ2) = ˆθ1/ˆθ2, where the estimated variancecovariance
2. A carefully controlled experiment was conducted to study the effect of the size of
the deposit level on the likelihood that a returnable one-liter soft drink bottle
will be returned. The data to follow show the number of bottles that were
returned (Wi) out of 500 sold (ni) at each of size deposit levels (Xi
in cents):
Deposit level xi 2 5 10 20 25 30
Number sold ni 500 500 500 500 500 500
Number returned wi 72 103 170 296 406 449
An analysist believes that a logistic regression model is appropriate for studying
the relation between the size of the deposit and the probability a bottle will be
returned.
(a) Find the maximum likelihood estimates for β0 and β1. Give the estimated
regression model.
(b) Obtain a scatter plot of the sample proportions against the level of the
deposit, and superimpose the estimated logistic response onto the plot.
Does the fitted logistic response function appear to fit well?
(c) Obtain exp(βˆ
1) and interpret this number.
(d) What is the estimated probability that a bottle will be returned when the
deposit is 15 cents?
(e) Estimate the amount of deposit for which 75% of the bottles are expected
to be returned.
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 2
(f) In part (e), we have an estimate ˆx = g(βˆ
0, βˆ
1) for the level of the deposit
that corresponds to π = 75% of the bottles are returned. This estimator is
a non-linear function of βˆ
0, βˆ
1. Use the delta-method to find an asymptotic
estimated standard error for this estimate. Hint: It will be helpful to
use the function vcov on your glm object. Furthermore, to multiply the
matrices A and B with R use A %*% B.
3. A marketing research firm was engaged by an automobile manufacturer to conduct
a pilot study to examine the feasibility of using logistic regression for
ascertaining the likelihood that a family will purchase a new car during the
next year. A random sample of 33 suburban families was selected. Data on
annual family income (x1, in thousands of dollars) and the current age of the
oldest family automobile (x2, in years) were obtained. A followup interview
conducted 12 months later was used to determine whether the family actually
purchased a new car (y = 1) or did not purchase a new car (y = 0) during the
year. The data is found in the file CarPurchase.csv.
(a) Find the maximum likelihood estimates of β0, β1, and β2. State the estimated
logistic regression model.
(b) Obtain exp(βˆ1) and exp(βˆ2) and interpret these numbers.
(c) What is the estimated probability that a family with annual income of $50
thousand and an oldest car of 3 years will purchase a new car next year?
4. Rather than finding the probability of success at an explanatory variable value,
it is often of interest to find the value of an explanatory variable given a desired
probability of success. This is referred to as inverse prediction. One application
of inverse prediction involves finding the amount of pesticide or herbicide needed
to have a desired kill rate when applied to pests or plants. The lethal dose level
xπ (commonly called “LDz”, where z = 100 π is defined as
xπ =(cloglog(π) − β0)β1
for the complementary log-log regression model
cloglog(π) = β0 + β1 x.
(a) Show how xπ is derived by solving for x in the complementary log-log
regression model.
(b) We can obtain 95% confidence interval for xπ as follows:
Describe how this confidence interval for xπ is derived. (Note that there is
generally no closed-form solution for the confidence interval limits, which
leads to the use of iterative numerical procedures.)
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 3
(c) Turner et al. (1992) uses logistic regression to estimate the rate at which
picloram, a herbicide, kills tall larkspur, a weed. Their data was collected
by applying four different levels of picloram to separate plots, and the
number of weeds killed out of the number of weeds within the plot was
recorded. The data are in the file picloram.csv. Complete the following:
(i) We will use a cloglog model instead of a logistic regression model. Give
the estimated complementary log-log model.
(ii) Compute eβˆ1 and interpret this number within the context of the problem.
(iii) Plot the observed proportion of killed weeds and the estimated model.
Describe how well the model fits the data.
Note: Here are some commands that you might find helpful. We are
assuming that the dataframe is called picloram.data and that the
fitted model is called mod.
## plot proportions versus x
with(picloram.data, plot(x = picloram, y = kill/total,
xlab = "Picloram", ylab = "Proportion of weeds killed",
panel.first = grid(col = "gray", lty = "dotted")))
# Put estimated esimated response on the plot
curve(expr = predict(object = mod,
newdata = data.frame(picloram = x), type = "response"),
col = "red", add = TRUE)
(iv) Estimate the 0.9 kill rate level “LD90” for picloram. Add lines to the
plot in (iii) to illustrate how it is found (the segments() function can
be useful for this purpose).
(v) We are assuming that your fitted model is the glm object mod. Use
the following commands to compute a 95% confidence interval for the
0.9 kill rate. Note: The function uniroot solves for the root of a
function over an interval.
b0 = summary(mod)$coefficients[1,1]
b1 = summary(mod)$coefficients[2,1]
LD.x<-(log(-log(1-0.9))-b0)/b1
root.func <- function(x, mod.obj, pi0, alpha) {
beta.hat <- mod.obj$coefficients
cov.mat <- vcov(mod.obj)
var.den <- cov.mat[1,1] + x^2*cov.mat[2,2] +
2*x*cov.mat[1,2]
abs(beta.hat[1] + beta.hat[2]*x - log(-log(1-pi0)))/
sqrt(var.den) - qnorm(1-alpha/2) }
lower <- uniroot(f = root.func, interval =
c(min(picloram.data$picloram), LD.x),
mod.obj = mod, pi0 = 0.9, alpha = 0.05)
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 4
upper <- uniroot(f = root.func, interval =
c(LD.x, max(picloram.data$picloram)),
mod.obj = mod, pi0 = 0.9, alpha = 0.05)
lower$root
upper$root
(vi) In part (v), we found a 95% CI for x0.9. Explain in a few sentences
how these commands give us the lower and the upper bound of the
confidence interval.