辅导dataset留学生、R编程设计讲解、辅导R、讲解computer skills 辅导R语言程序|辅导R语言编程

2019.11.02 - 首页 >> 其他

Assignment 3
Exercise 1
For this question we will use a dataset from a randomized experiment conducted by Marianne Bertrand and Sendhil Mullainathan, who sent 4,870 fictitious resumes out to employers in response to job adverts in Boston and Chicago in 2001. The resumes differ in various attributes including the names of the applicants, and different resumes were randomly allocated to job openings. Some of the names are distinctly white sounding and some distinctly black sounding. The researchers collecting these data were interested to learn whether black sounding names obtain fewer callbacks for interviews than white names. Load the data set: data/bm.dta
(a) The data set contains two dummy variables (0-1 variables) for female (female) and whether the applicant has computer skills (computerskills). Tabulate these variables by black. Do gender and computer skills look balanced – i.e. random - across race groups?
(b) Do a similar tabulation for education and the number of jobs previous held (ofjobs). These variables take on 5 and 7 different values, respectively. Does education and the number of previous jobs look balanced across race groups?
(c) Look at the mean and standard deviation for the variable for years of experience (yearsexp) separately for black and whites. Does this variable look similar by race?
(d) What do you make of the overall results on resume characteristics? Why do we care about whether these variables look similar across the race groups?
(e) The variable of interest on the data set is the variable call, which indicates a call back for an interview. Do you find differences in call back rates by race?
(f) What do you conclude from the results of the Bertand and Mullainathan experiment?

Exercise 2
Consider again the dataset from the experiment by Bertand and Mullainathan (bm.dta).
(a)Develop a regression to examine if the difference in interview callbacks between black and white “sounding” CVs is significantly different.
(b)Execute the regression. What do you conclude?
Exercise 3
For this question, download the data set cps.dta, which comes from the responses to the monthly US Current Population Survey (CPS) in 2001, a large labour market survey. This data set contains data on 8,891 individuals living in Boston and Chicago. We want to use these data to compare the skills of real live blacks and whites (as opposed to made up CVs), and their employment outcomes and see how they differ from the findings in the exercises involving the bm.dta dataset.

(a) The data set contains a variable education, which takes on four values (high school dropouts, high school graduates, some college, and college degree and more). Use the education variable to create a new dummy for resumes indicating some college or more (i.e. those in the “some college” category plus those in the college and more category). What fraction of respondents has at least some college education?

(b) Conduct a regression analysis of the chances of being employed for people with different racial backgrounds.
(c) Conduct a regression analysis of the chances of having college education for people with different racial backgrounds
(d) On the basis of your evidence what can you conclude about racial discrimination in the US labor market? What are potential caveats? What analysis could you undertake to address some of these caveats?

Exercise 4
Consider once more the cps.dta dataset. In exercise 3 we argued that not accounting for education might bias our estimate of the impact of racial background. We then examined the issue by looking at college educated vs not college educated workers separately.
(a) Can you propose an alternative strategy using a multivariate regression approach?
(b) There are potentially two models you might have used in part (a) one that implies that the effect of race is the same for both educational groups or one that implies the effect of race is different for different educational groups. Can you propose and conduct a hypothesis test that could help us decide which model is more appropriate?
(c) On the basis of your regressions in (b), what is the racial gap for college educated people? What is the racial gap for not college educated people? How does this compare to your findings in Exercise 3?

Exercise 5
Use wage1.dta. Examine once more the relationship between education and wages. Would you say the relationship is different for men and women?

Exercise 6
Use the production2.dta dataset. This data set contains data on output (value added) and inputs at the industry level for 459 industries in 1958 and 1993. Suppose the relationship between output and inputs is described by a Cobb- Douglas production function

where Yi is a measure of output, Ki is the capital stock, and Li is employment. Answer all questions for the year 1958 only.

(a) Transform the production function to a linear equation by taking logs. Estimate the parameters and by an OLS regression using total value added as your measure of output.  
(b) Test whether your estimates are consistent with the production function exhibiting constant returns to scale, i.e.

H0:  
against the alternative

H0:

Do you reject the hypothesis at the 5% level? What is the p-value of your test?
 
(c) An alternative way to test the hypothesis of constant returns to scale is to impose this restriction on the parameters and transform your regression model. Derive the necessary transformation, and show how the constant returns hypothesis amounts to a t-test in this transformed model. Carry out this test. Verify that your result matches what you found in (b).  

(d) What is the average size in numbers employed across all industries in 1958? Suppose an industry of average size employs an additional 1000 workers. What does the model estimated in part (a) imply about the effect this will have on value added?

(e) Do you think it is reasonable to assume that our estimates in (a) are unbiased? Justify your answer.

Exercise 7
Use the dataset attend.dta to analyze whether attending lectures has a causal effect on final exam performance. The dataset contains 674 observations on college students who took a particular course. Most variables on the dataset should be self-explanatory. The ACT is a college entry test. GPA is grade point average, the average performance in all courses.
(a) Run a regression of stndfnl, the standardized final exam score, on attend, the number of lectures attended (note: the data is from the US where they call a lecture a class). What is the association between attendance and exam performance? Is the effect large or small?  
(b) What is your main worry about the uncontrolled regression in (a) if you are interested in the causal effect of attendance on exam performance? How would address this worry?  

(c) Enter each of the following variables one at a time as a control in your regression: termgpa, priGPA, ACT. For each of these controls, answer the following questions:
1.Does entering the control variable help solve the problem you discussed in part (b) and gets you closer to a causal effect of lecture attendance?
2.Does entering the control variable create potential new problems in interpreting the coefficient on attend causally? Why?
3.What happens to the coefficient on attend? Interpret this result.  

(d) Drawing on your discussion in (c), which of the control variables termgpa, priGPA, ACT would you like to have in your regression in order to uncover the causal effect of lecture attendance? Why? Run your preferred specification and discuss the result.  

(e) Students who are diligent in attending lectures may also be more diligent about other aspects of their coursework, like completing homework. There is a variable hwrte in the dataset indicating the percentage of homework turned in. Add this variable as a regressor to your preferred specification from (d). What role does this variable take on in your regression and is it a regressor you want in order to uncover the causal effect of lecture attendance on student performance? What is the effect of hwrte on exam performance? What happens to the coefficient on attend? Interpret your results.  

(f) There is a variable skipped in the dataset indicating the number of skipped lectures. Add this variable as a regressor to your preferred specification from (d). What happens and why?  

Exercise 8
Download the data TeachingRatings.dta. This data set contains data on the teaching evaluations of 463 professors at the University of Texas, and various attributes of the professor and course.
(a) Run a regression of course_eval on beauty. Beauty is an index that was based on a subjective scoring. What is the slope coefficient of the regression?  
(b) The number in (a) will not have much meaning to anyone who doesn’t know anything about the data. How would you assess if the effect of beauty on course_eval big or small?  
(c) Is the slope coefficient in (a) statistically significantly different from zero?  
(d) Run a regression of course_eval on beauty and female. What is the coefficient on beauty now? Explain in detail why the result is different from your result in (a).  
(e) What is the R2 of the regression in (d)? How does it compare to the corresponding regression which does not include female? Which one is higher and why?  
(g) Do you think the effect of beauty can be interpreted causally in either of your regressions? Explain why or why not.