代做ECF5410 - Take Home Exercise 3调试R程序
- 首页 >> C/C++编程ECF5410 - Take Home Exercise 3
Rev. 2023-03-17
Follow the below instructions and turn in both your code and results:
1. Load the mathpnl, which comes from Leslie Papke and consists of data at the school district level, and was featured in the Wooldridge (2010) textbook.
Tip: Install the wooldridge package and run mathpnl <- wooldridge::mathpnl to save the dataframe. You may want to also want to %>% this into as_tibble().
We are only going to be working with a few variables.
- distid: the district identifier (our “individual” for fixed effects)
- year: the year the data is from
- math4: the percentage of 4th grade students who are “satisfactory” or better in math
- expp: expenditure per pupil
- lunch: the percentage of students eligible for free lunch
- intid: this will be used to help plotting in Q5
2. Panel data is often described as “N by T”. That is, the number of different individuals N and the number of time periods T. Write code that outputs what N and T are in this data.
Tip: you can count the number of observations for each distid & year by using distinct() and nrow() or count().
3. A balanced panel is one in which each individual shows up in every single time period. You can check whether a data set is a balanced panel by seeing whether the number of unique time periods each individual ID shows up in is the same as the number of unique time periods, or whether the number of unique individual IDs in each time period is the same as the total number of unique individual IDs.
Think to yourself a second about why these procedures would check that this is a balanced panel.
Then, check whether this data set is a balanced panel.
Tip: We can use distinct() for N & T and then use table() for a cross tabulation.
Tip2: Please do not output the whole cross-tab into your document - it will be too long.
4. Create a scatter plot with lunch on the x-axis & math4 on the y-axis. What does the relationship look like? Is it intuitive?
5. Now create another plot with a distid colour aesthetic, what can you see now?
Tip: Given the large dataset, we won’t be able to draw any insights. So filter your dataset such that intid == 9 before passing onto ggplot()
Tip2: As distid is numeric, ggplot() will consider it as so. Try to provide distid as a factor instead
6. Given the new plot, should the relationship apply to majority of the distids? Explain.
7. Run an OLS regression, with no fixed effects, of math4 on expp and lunch. Store the results as m1.
8. Modify the model in step 4 to include fixed effects for distid “by hand”. That is, subtract out the within-distid mean of math4, expp, and lunch, creating new variables math4_demean, expp_demean, and lunch_demean, and re-estimate the model using those variables, storing the result as m2.
9. Next run an OLS regression using dummy variables for each distid. Save this as m3.
Tip: Again, as distid is numeric, use it in lm() as a factor
10. Now we will use a specially-designed function to estimate a model with fixed effects. Use feols() from the fixest package to estimate the model from step 4 but with fixed effects for distid. Save the result as m4.
11. Using msummary(), make a regression table including m1 through m4 so you can compare them all.
Write down two interesting things you notice from the table. Multiple possible answers here.
Tip: As there are a lot of dummy variables in m3, provide msummary() with the argument coef_omit = "distid" to remove them from the regression table
Submit a pdf (knitted RMarkdown) document with your answers and the code on moodle by THU, March 30 9:00AM.
moodle/week 4/Take Home Exercise