辅导ECON 322、讲解Econometric Analysis、辅导R编程、R语言讲解 讲解Java程序|辅导留学生 Statistics统
- 首页 >> 其他 ECON 322: Econometric Analysis 1
Final data project: Winter 2019
General instructions
This last assignment is due on Sunday April 7 before 11:30pm on Learn. It is a small research project
and you will be evaluated on your ability to correctly use the different concepts covered during the
term. It is approximately worth 3 assignments (8.5 points out of 25). The drop box will not close, so
you will be allowed to submit the project late. The rule for late submissions is as follows: if you are
between 0.01 and 60 minutes late you get a 2/10 penalty, between 60.01 and 120 minutes late you get
5/10 penalty, and you get a 10/10 penalty if you submit the project more than two hours late. To
avoid penalties, do not wait at the last minute to upload it. Notice that there is no justification for
not submitting this assignment (I remind you that doctor notes for being sick around the due date is
not accepted for assignments).
For the final assignment, I want it to be organized like a report. I want the codes and R output along
with the comments and discussion in the same pdf file. If you upload your document in any other
format (.doc, .docx, ...), I will not mark it. If you want to see what I expect from you, download the
document Assignment7 W17Sol.pdf that I uploaded in the folder “Final Project”. It is the suggested
solution from Winter 2017 project. If you only put the codes and output with no discussion (one
sentence it not considered a discussion), you get 0 out of 10 points. To obtain the full mark, you need
to justify what you are doing (choice of the model, tests, etc.) analyze the results (interpretation of
the coefficients, discussion about the validity of your results, etc.), and show me that you know how
to use the different concepts used in class. The more concepts you use the higher will be your mark.
The project
For this project, we use the data “Fatalities” from the “AER” package. To get the data, install the
“AER” package, then load the data using the following command:
library(AER)
data(Fatalities)
The data frame will be called “Fatalities”. For a complete description of the variables, you can use
the help() function as follows:
help(Fatalities)
The main objective of the project is to measure the impact of different alcohol policies on car
fatalities. As you will discover by looking at the help file, there are several measures of car fatalities.
However, we assume that the same model can be used to explain any of the measure. You can therefore
select the model independently from which car fatality measures is used on the left hand side. You
may want to comment on that assumption. Do you think it is a reasonable assumption?
You will see that all qualitative variables are expressed as factors. We have covered that in one of
your tutorials, so you should be able to figure out how to deal with such format. For example, you can
run a regression of f atal on jail, which is “yes” or “no”, and let R create dummy variables for you:
Econ 322 Final Project Page 1 of 5lm(fatal~jail, Fatalities)
##
## Call:
## lm(formula = fatal ~ jail, data = Fatalities)
##
## Coefficients:
## (Intercept) jailyes
## 1034.4 -424.4
The main policy variable we are interested in is beertax, drinkage, breath and jail. breath is equal
to “yes” if the police is authorized to administer pre-arrest breath test for alcohol and jail is equal to
“yes” if the law requires a jail sentence for the first conviction.
Keep in mind that the objective is to test whether the different policies have an impact on car
fatalities.
Part I
You will see that the dataset is a panel of 48 states and 7 years (1982 to 1988). We have not seen how
to deal with panel data, but we will ignore it and proceed as if we had cross-sectional data. In Part 2,
I will show you one way to estimate such models.
The objective of this part is to build a model that will allow you to test the effect of the different
policies on different types of car fatalities. For now, we assume that time and state have no impact on
the results. In other words, we assume that there are no unobserved differences across time and states
that may be related to the policy adoptions and on car fatalities. You may want to add a paragraph
to discuss this assumption and how it may impact your results.
The selection of the variables you want to control for must be based only on your economic intuition.
You need to control for variables that are likely to be correlated with the number of car accidents and,
at the same time, with the choice of adopting the policy. You may also consider to add variables that
you think may be related to the number of accidents, even if they are unrelated to the policy adoption
by the states. The addition of such variables, as you should know, is likely to reduce the standard
error of the coefficients.
You may begin by assuming that the different policies are correlated with each others, so the model
should include the three policy indicators. The smallest model to consider is therefore:
lm(fatal~jail+breath+beertax, Fatalities)
##
## Call:
## lm(formula = fatal ~ jail + breath + beertax, data = Fatalities)
##
## Coefficients:
## (Intercept) jailyes breathyes beertax
## 1222.97 -572.14 -383.41 58.92
Of course, it would be unacceptable for you to choose this simple model. Once you have decided on
which variable to include, you have to think about how to insert those variables in your model (in log,
with a squared term, with interactions etc.). You may want to apply some tests that we saw in class
to select one model among a few that you want to consider. You may want to consider interacting the
Econ 322 Final Project Page 2 of 5policy variables. It is possible, for example, that the effect of the beer tax is different in states where
jail = yes compared to those where jail = no.
To build your model (functional form and variable selection), use the total number of car fatalities
(the variable f atal). The following is a todo list. These are not parts that you have to answer
individually. These are just elements that I expect to find somewhere in your report.
Discussion on which variable should be included and why.
Discussion on how each variable should enter the model (in log, with interactions, squared, etc.).
It may not be obvious for all variables, but try your best.
Estimate some models (if you have more than one in mind... well, you should certainly try more
than one)
Test for correct specification (Chapter 9), homoscedasticity (Chapter 8). It is very important
that you only use robust tests if you reject homoscedasticity. You will be penalized if you don’t.
Any other things to look for before going to the interpretation part (are there any outliers)?
Interpret the result and discuss the possible weakness of the model. Here I want the interpretation
of all coefficients of the policy variables (and their interactions if any). We are not interested in
the coefficients of the control variables.
Estimate the same model (same right hand sides) for a few car fatality measures. Choose 3
measures that you are interested to analyze among the 10 available in the dataset. Interpret the
results.
There is no such thing as finding the right model. The evaluation is only based on how you justify
your choice and how well you use the different concepts that we have covered during the term.
Part II
We saw in class that past information can be used as a proxy for state characteristics that may have
pushed them to adopt the policy. As an exercise, estimate your model (the same used in Part I) for
1985 only with f atal as dependent variabe, and add the 1982 f atal as a proxy. Compare the results
with and without the proxy. You have to estimate the model using this dataset:
dat1985 <- subset(Fatalities, year==1985)
and create the proxy as follows:
fatal1982 <- subset(Fatalities, year==1982)$fatal
Compare also your results with what you obtained in Part I. What do you think is the weakness
of this approach.
Econ 322 Final Project Page 3 of 5Part III
In the final part of the project, I want you to estimate a fixed-effect model. It looks like a fancy word,
but it is just a regression with many dummy variables. When we deal with panel data, we want to
control for time trend and unobserved heterogeneities across states. If we don’t, we are likely to get
biased estimator of the policy effect, for the same reason we would bias the estimate if we do not
control for observed variables that are related to accidents and policy adoption. Adding time and
state fixed effect means that we add dummy variables for states and years. We have 7 years and 48
states, so we want 6 time dummies, and 47 state dummies (we omit one to avoid multicolinearity).
For example, the NewYork dummy is 1 if the observation is from the New York state and 0 otherwise.
Similarly, the 1982 dummy is 1 of the observation is from 1982 and 0 otherwise.
In R, it is easy to add time and year fixed effects. All you need is for these variables to be defined
as factors. You do not want year to be defined as integer because lm() would think that it is a regular
vector of numbers. We can check the type as follows:
is(Fatalities$year, "factor")
## [1] TRUE
is(Fatalities$state, "factor")
## [1] TRUE
They are factors, so we are good to go. All you have to do is to add year and state to your model
from part I. For example:
res <- lm(fatal~jail+year+state, Fatalities)
length(coef(res))
## [1] 55
You see that there are 55 coefficients; an intercept, the coefficient of jail and 53 coefficients for
the time and state dummies. When you print the results, you never report the coefficients of the time
and state dummy variables. Here is how to use stagazer and omit all coefficients of the time and state
fixed effects in the printed results:
library(stargazer)
stargazer(res, type="text", omit=c("year","state"), digits=5)
##
## ===============================================
## Dependent variable:
## ---------------------------
## fatal
## -----------------------------------------------
## jailyes 0.45451
## (36.26195)
##
## Constant 952.64660***
## (38.31339)
##
## -----------------------------------------------
## Observations 335
## R2 0.99068
## Adjusted R2 0.98888
Econ 322 Final Project Page 4 of 5## Residual Std. Error 95.20186 (df = 280)
## F Statistic 551.20140*** (df = 54; 280)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
You get the info about the number of variables in the first degrees of freedom of the F-Statistic.
Estimate your model of Part I with the year and state fixed effects added to it. Do it for all
3 measures of car fatalities you analyzed. Interpret your results and compare them with what you
obtained in Part I.
Econ 322 Final Project Page 5 of 5
Final data project: Winter 2019
General instructions
This last assignment is due on Sunday April 7 before 11:30pm on Learn. It is a small research project
and you will be evaluated on your ability to correctly use the different concepts covered during the
term. It is approximately worth 3 assignments (8.5 points out of 25). The drop box will not close, so
you will be allowed to submit the project late. The rule for late submissions is as follows: if you are
between 0.01 and 60 minutes late you get a 2/10 penalty, between 60.01 and 120 minutes late you get
5/10 penalty, and you get a 10/10 penalty if you submit the project more than two hours late. To
avoid penalties, do not wait at the last minute to upload it. Notice that there is no justification for
not submitting this assignment (I remind you that doctor notes for being sick around the due date is
not accepted for assignments).
For the final assignment, I want it to be organized like a report. I want the codes and R output along
with the comments and discussion in the same pdf file. If you upload your document in any other
format (.doc, .docx, ...), I will not mark it. If you want to see what I expect from you, download the
document Assignment7 W17Sol.pdf that I uploaded in the folder “Final Project”. It is the suggested
solution from Winter 2017 project. If you only put the codes and output with no discussion (one
sentence it not considered a discussion), you get 0 out of 10 points. To obtain the full mark, you need
to justify what you are doing (choice of the model, tests, etc.) analyze the results (interpretation of
the coefficients, discussion about the validity of your results, etc.), and show me that you know how
to use the different concepts used in class. The more concepts you use the higher will be your mark.
The project
For this project, we use the data “Fatalities” from the “AER” package. To get the data, install the
“AER” package, then load the data using the following command:
library(AER)
data(Fatalities)
The data frame will be called “Fatalities”. For a complete description of the variables, you can use
the help() function as follows:
help(Fatalities)
The main objective of the project is to measure the impact of different alcohol policies on car
fatalities. As you will discover by looking at the help file, there are several measures of car fatalities.
However, we assume that the same model can be used to explain any of the measure. You can therefore
select the model independently from which car fatality measures is used on the left hand side. You
may want to comment on that assumption. Do you think it is a reasonable assumption?
You will see that all qualitative variables are expressed as factors. We have covered that in one of
your tutorials, so you should be able to figure out how to deal with such format. For example, you can
run a regression of f atal on jail, which is “yes” or “no”, and let R create dummy variables for you:
Econ 322 Final Project Page 1 of 5lm(fatal~jail, Fatalities)
##
## Call:
## lm(formula = fatal ~ jail, data = Fatalities)
##
## Coefficients:
## (Intercept) jailyes
## 1034.4 -424.4
The main policy variable we are interested in is beertax, drinkage, breath and jail. breath is equal
to “yes” if the police is authorized to administer pre-arrest breath test for alcohol and jail is equal to
“yes” if the law requires a jail sentence for the first conviction.
Keep in mind that the objective is to test whether the different policies have an impact on car
fatalities.
Part I
You will see that the dataset is a panel of 48 states and 7 years (1982 to 1988). We have not seen how
to deal with panel data, but we will ignore it and proceed as if we had cross-sectional data. In Part 2,
I will show you one way to estimate such models.
The objective of this part is to build a model that will allow you to test the effect of the different
policies on different types of car fatalities. For now, we assume that time and state have no impact on
the results. In other words, we assume that there are no unobserved differences across time and states
that may be related to the policy adoptions and on car fatalities. You may want to add a paragraph
to discuss this assumption and how it may impact your results.
The selection of the variables you want to control for must be based only on your economic intuition.
You need to control for variables that are likely to be correlated with the number of car accidents and,
at the same time, with the choice of adopting the policy. You may also consider to add variables that
you think may be related to the number of accidents, even if they are unrelated to the policy adoption
by the states. The addition of such variables, as you should know, is likely to reduce the standard
error of the coefficients.
You may begin by assuming that the different policies are correlated with each others, so the model
should include the three policy indicators. The smallest model to consider is therefore:
lm(fatal~jail+breath+beertax, Fatalities)
##
## Call:
## lm(formula = fatal ~ jail + breath + beertax, data = Fatalities)
##
## Coefficients:
## (Intercept) jailyes breathyes beertax
## 1222.97 -572.14 -383.41 58.92
Of course, it would be unacceptable for you to choose this simple model. Once you have decided on
which variable to include, you have to think about how to insert those variables in your model (in log,
with a squared term, with interactions etc.). You may want to apply some tests that we saw in class
to select one model among a few that you want to consider. You may want to consider interacting the
Econ 322 Final Project Page 2 of 5policy variables. It is possible, for example, that the effect of the beer tax is different in states where
jail = yes compared to those where jail = no.
To build your model (functional form and variable selection), use the total number of car fatalities
(the variable f atal). The following is a todo list. These are not parts that you have to answer
individually. These are just elements that I expect to find somewhere in your report.
Discussion on which variable should be included and why.
Discussion on how each variable should enter the model (in log, with interactions, squared, etc.).
It may not be obvious for all variables, but try your best.
Estimate some models (if you have more than one in mind... well, you should certainly try more
than one)
Test for correct specification (Chapter 9), homoscedasticity (Chapter 8). It is very important
that you only use robust tests if you reject homoscedasticity. You will be penalized if you don’t.
Any other things to look for before going to the interpretation part (are there any outliers)?
Interpret the result and discuss the possible weakness of the model. Here I want the interpretation
of all coefficients of the policy variables (and their interactions if any). We are not interested in
the coefficients of the control variables.
Estimate the same model (same right hand sides) for a few car fatality measures. Choose 3
measures that you are interested to analyze among the 10 available in the dataset. Interpret the
results.
There is no such thing as finding the right model. The evaluation is only based on how you justify
your choice and how well you use the different concepts that we have covered during the term.
Part II
We saw in class that past information can be used as a proxy for state characteristics that may have
pushed them to adopt the policy. As an exercise, estimate your model (the same used in Part I) for
1985 only with f atal as dependent variabe, and add the 1982 f atal as a proxy. Compare the results
with and without the proxy. You have to estimate the model using this dataset:
dat1985 <- subset(Fatalities, year==1985)
and create the proxy as follows:
fatal1982 <- subset(Fatalities, year==1982)$fatal
Compare also your results with what you obtained in Part I. What do you think is the weakness
of this approach.
Econ 322 Final Project Page 3 of 5Part III
In the final part of the project, I want you to estimate a fixed-effect model. It looks like a fancy word,
but it is just a regression with many dummy variables. When we deal with panel data, we want to
control for time trend and unobserved heterogeneities across states. If we don’t, we are likely to get
biased estimator of the policy effect, for the same reason we would bias the estimate if we do not
control for observed variables that are related to accidents and policy adoption. Adding time and
state fixed effect means that we add dummy variables for states and years. We have 7 years and 48
states, so we want 6 time dummies, and 47 state dummies (we omit one to avoid multicolinearity).
For example, the NewYork dummy is 1 if the observation is from the New York state and 0 otherwise.
Similarly, the 1982 dummy is 1 of the observation is from 1982 and 0 otherwise.
In R, it is easy to add time and year fixed effects. All you need is for these variables to be defined
as factors. You do not want year to be defined as integer because lm() would think that it is a regular
vector of numbers. We can check the type as follows:
is(Fatalities$year, "factor")
## [1] TRUE
is(Fatalities$state, "factor")
## [1] TRUE
They are factors, so we are good to go. All you have to do is to add year and state to your model
from part I. For example:
res <- lm(fatal~jail+year+state, Fatalities)
length(coef(res))
## [1] 55
You see that there are 55 coefficients; an intercept, the coefficient of jail and 53 coefficients for
the time and state dummies. When you print the results, you never report the coefficients of the time
and state dummy variables. Here is how to use stagazer and omit all coefficients of the time and state
fixed effects in the printed results:
library(stargazer)
stargazer(res, type="text", omit=c("year","state"), digits=5)
##
## ===============================================
## Dependent variable:
## ---------------------------
## fatal
## -----------------------------------------------
## jailyes 0.45451
## (36.26195)
##
## Constant 952.64660***
## (38.31339)
##
## -----------------------------------------------
## Observations 335
## R2 0.99068
## Adjusted R2 0.98888
Econ 322 Final Project Page 4 of 5## Residual Std. Error 95.20186 (df = 280)
## F Statistic 551.20140*** (df = 54; 280)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
You get the info about the number of variables in the first degrees of freedom of the F-Statistic.
Estimate your model of Part I with the year and state fixed effects added to it. Do it for all
3 measures of car fatalities you analyzed. Interpret your results and compare them with what you
obtained in Part I.
Econ 322 Final Project Page 5 of 5