辅导Math/Stat 511、讲解Python,c/c++编程语言、辅导Java程序设计、辅导RCBD
- 首页 >> Web Final Exam Takehome Exam (Due by 3:00 p.m. May 3 Friday) - Math/Stat 511 Spring 2019
DO NOT WORK IN GROUPS FOR THIS EXAM OR DISCUSS THIS EXAM IN ANY
MANNER WITH CLASSMATES
1. A lawn care company was interested in reducing the amount of time that their employees spent on
lawn care activities for individual customers. The company decided to conduct an A/B test to assess
whether trimming the grass near the fence edges had any influence on customer attrition. The company
decided to randomly divide customers into a control group (lawn care with grass trimmed near fence
edges) and a reduced time group (lawn care with grass not trimmed near fence edges). Out of their
238 clients, 119 were assigned to the control group and 119 were assigned to the reduced time group.
One year later, it was assessed whether customers were still using the company’s services. The results
of the study is shown in a table below.
Use one of the methods that we have discussed in class to help provide a recommendation as to whether
you feel: 1) the company should collect more data, 2) the company could reasonably reduce the time
on each client without expecting a substantial increase in attrition rates, 3) the company should stick
with the regular service. To get full credit you must discuss practical significance and whether or
not the study was informative. Use your own judgement concerning what would constitute practical
significance in this case.
Make sure to include any code you use in your answer.
still using stopped_using
regular 87 32
time_reduced 63 56
2. A researcher is interested in comparing soil quality after growing winter wheat, quinoa, chickpeas and
barley. The researcher grows these crops on a single farm, using a randomized complete block design
(RCBD) with 6 blocks and 4 treatments per block. The data can be found in the file quinoa.csv in
the homework data folder. The primary questions of interest are: a) is there any evidence of differences
in soil nitrogen after growing each of the four crops and b) if so, which crops are different from one
another in terms of nitrogen content. Read in the data file quinoa.csv in the homework data folder on
blackboard to prepare for the analysis. Whatever method you choose for analysis, make sure that the
method uses all the data for estimating the error variance.
Include in your answer:
a. A discussion of statistical evidence to answer the researcher’s question concerning any evidence
of difference in soil nitrogen
b. A discussion of statistical evidence to answer the researcher’s question concerning which crops
show evidence of a difference in nitrogen (if relevant)
c. A plot of the average nitrogen levels for each crop with 95% confidence intervals.
d. A discussion of which model assumptions appear plausible for this data. Include plots and describe
what you are assessing with each plot included.
Make sure to include any code you use in your answer.
3. Researchers were interested in determining whether horse survival (yes/no) was influenced by colloid
administration. A retrospective review of medical records of horses with enterocolitis (digestive tract
inflammation) treated with two colloid types, natural plasma or synthetic hetastarch, was conducted.
Data collected included whether or not the horse survived until discharge, along with two important
risk factors PCV (packed cell volume) and TS (total solids). A total of 92 horses were included in
the review. The data can be found in the file hetastarch.csv in the homework data folder. The
variable Outcome indicates whether or not the horse survived until discharge (1=no, 0=yes). With
1major consideration to the response variable type, use an appropriate model for this data to gather
evidence concerning whether horses that recieve different fluids show a differential in the risk of death
after accounting for important risk factors. Note that the risk factors in this dataset have missing
values. One of the simplest methods for dealing with missing data is to use imputation of the mean.
While this method is not necessarily the best way of dealing with missing data, the amount of missing
data is relatively small and it was thought that it would not have a major effect on the analysis in
this case. Thus, for this analysis, fill in missing values of risk factors with the mean of all non-missing
values.
In your answer, make sure to include the following:
a. State the scope of statistical inference (pop’n/causal) given what you know about the study
b. Include a statement that summarises what is plausibly consistent with the data when considering
the association between survival and colloid administration. This may include the degree of
evidence against the null hypothesis (optional), but should summarize the magnitude of effects
that are consistent with the data. Additionally, you should discuss whether this range covers both
effects that are meaningful and not meaningful in size (use your own judgement here).
Make sure to include any code you use in your answer.
2
DO NOT WORK IN GROUPS FOR THIS EXAM OR DISCUSS THIS EXAM IN ANY
MANNER WITH CLASSMATES
1. A lawn care company was interested in reducing the amount of time that their employees spent on
lawn care activities for individual customers. The company decided to conduct an A/B test to assess
whether trimming the grass near the fence edges had any influence on customer attrition. The company
decided to randomly divide customers into a control group (lawn care with grass trimmed near fence
edges) and a reduced time group (lawn care with grass not trimmed near fence edges). Out of their
238 clients, 119 were assigned to the control group and 119 were assigned to the reduced time group.
One year later, it was assessed whether customers were still using the company’s services. The results
of the study is shown in a table below.
Use one of the methods that we have discussed in class to help provide a recommendation as to whether
you feel: 1) the company should collect more data, 2) the company could reasonably reduce the time
on each client without expecting a substantial increase in attrition rates, 3) the company should stick
with the regular service. To get full credit you must discuss practical significance and whether or
not the study was informative. Use your own judgement concerning what would constitute practical
significance in this case.
Make sure to include any code you use in your answer.
still using stopped_using
regular 87 32
time_reduced 63 56
2. A researcher is interested in comparing soil quality after growing winter wheat, quinoa, chickpeas and
barley. The researcher grows these crops on a single farm, using a randomized complete block design
(RCBD) with 6 blocks and 4 treatments per block. The data can be found in the file quinoa.csv in
the homework data folder. The primary questions of interest are: a) is there any evidence of differences
in soil nitrogen after growing each of the four crops and b) if so, which crops are different from one
another in terms of nitrogen content. Read in the data file quinoa.csv in the homework data folder on
blackboard to prepare for the analysis. Whatever method you choose for analysis, make sure that the
method uses all the data for estimating the error variance.
Include in your answer:
a. A discussion of statistical evidence to answer the researcher’s question concerning any evidence
of difference in soil nitrogen
b. A discussion of statistical evidence to answer the researcher’s question concerning which crops
show evidence of a difference in nitrogen (if relevant)
c. A plot of the average nitrogen levels for each crop with 95% confidence intervals.
d. A discussion of which model assumptions appear plausible for this data. Include plots and describe
what you are assessing with each plot included.
Make sure to include any code you use in your answer.
3. Researchers were interested in determining whether horse survival (yes/no) was influenced by colloid
administration. A retrospective review of medical records of horses with enterocolitis (digestive tract
inflammation) treated with two colloid types, natural plasma or synthetic hetastarch, was conducted.
Data collected included whether or not the horse survived until discharge, along with two important
risk factors PCV (packed cell volume) and TS (total solids). A total of 92 horses were included in
the review. The data can be found in the file hetastarch.csv in the homework data folder. The
variable Outcome indicates whether or not the horse survived until discharge (1=no, 0=yes). With
1major consideration to the response variable type, use an appropriate model for this data to gather
evidence concerning whether horses that recieve different fluids show a differential in the risk of death
after accounting for important risk factors. Note that the risk factors in this dataset have missing
values. One of the simplest methods for dealing with missing data is to use imputation of the mean.
While this method is not necessarily the best way of dealing with missing data, the amount of missing
data is relatively small and it was thought that it would not have a major effect on the analysis in
this case. Thus, for this analysis, fill in missing values of risk factors with the mean of all non-missing
values.
In your answer, make sure to include the following:
a. State the scope of statistical inference (pop’n/causal) given what you know about the study
b. Include a statement that summarises what is plausibly consistent with the data when considering
the association between survival and colloid administration. This may include the degree of
evidence against the null hypothesis (optional), but should summarize the magnitude of effects
that are consistent with the data. Additionally, you should discuss whether this range covers both
effects that are meaningful and not meaningful in size (use your own judgement here).
Make sure to include any code you use in your answer.
2