ECON 318讲解、辅导data留学生、讲解R程序设计、辅导R设计 辅导R语言程序|讲解数据库SQL
- 首页 >> OS编程 ECON 318 Homework 6
Due 11/11 in class
Note: (1) Homework should be submitted in pdf/world format generated from RMarkdown (2) Please
include your answers, analysis, code, reasoning and the key steps, for instance the tables/plots produced by
R. Simply writing down the solution earns 0 point (3) Plagiarism is not accepted. Any similar homework
will get zero point.
Q1 [15pt]
Suppose you want to estimate the seasonal effect on the revenue. There is a constant term included in the
regression as usual. How many dummies are needed to perform such analysis?
Q2 [20pt]
Use the data in gpa2 and GPA2_description for this exercise.
1. Using all observations and regress colgpa on hsperc and sat.
2. Reestimate the model using only the first 2,070 observations
3. Find the ratio of the standard erros on hsperc from 1. and 2. what do you find? why?
4. Add female, verbmath and their interaction terms into the regression using all observations.
Q3 [20pt]
Load package ggplot2 and type data(diamonds) to load the data set. The definition of table and depth
can be found in the following picture
library(ggplot2)
dia <- diamonds
1. A diamond’s quality can be measured by cut, ordered by Ideal, Premium, Very Good, Good, and Fair.
Create dummy D1 to represent Ideal and Premium, and D2 to represent Very Good and Good.
2. Regress price on carat, depth, table, D1 and D2, all interactions terms between dummies and quantitative
variables (carat, depth and table). Interpret your result
Figure 1:
1
3. Create a random sample of size 1000 from the diamonds data. Draw the scatterplot of carat vs log(price),
color coded by cut.
4. List the distinct categories of color. What is their ordering?
Q4 [45pt] (Just Answer the Question; No R Command)
According to past series of Bond films, the average number of people that are killed by Bond shows substantial
variations among different Bond actors, as shown in the following graph. In particular, Pierce Brosnan ranks
#1 on this list. To study whether the revenue of the film are affected by the number of people that Bond
killed, we performed regression analysis on the available data set. The data are based on 23 past Bond films
with all the 6 Bond actors. For each film, we have information on the adjusted worldwide gross (in 1000
dollars), the average rating (on a 1-10 basis with 10 being the best), rating, film budget,the number of people
Bond killed and others killed in each film, bond actors and the year of the film. To start with, we build up the
following model to see if the number of people that Bond killed in the film would affect the worldwide gross,
Where log(gross) is the logarithm of the worldwide gross, Bond kills is the number of people that Bond
killed in the film, Pierce is a dummy variable indicating whether the Bond actor is Pierce. The following
table shows the estimation results.
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'bindrcpp' was built under R version 3.5.2
Table 1:
Dependent variable:
log(Gross)
‘Bond kills‘ 0.02∗∗
(0.002, 0.04)
Pierce −0.53∗∗
(−1.03, −0.04)
Constant 13.05∗∗∗
(12.79, 13.31)
Observations 23
R2 0.21
Adjusted R2 0.14
Residual Std. Error 0.32 (df = 20)
F Statistic 2.73∗
(df = 2; 20)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
Answer Q1-Q4 using the regression results above.
1. Does the number of people Bond killed significantly affect the worldwide gross at 5% level? Interpret
the estimated coefficient of Bond kills.
2. Interpret the estimated coefficient of Pierce.
3. Is the regression overall significant at 5% level?
4. What does the adjusted Rˆ2 measure?
2
Suppose you believe that the decade of 1990’s is the booming age for Bond films, so you include a time
dummy variable decade90 into the model.
Table 2:
Dependent variable:
log(Gross)
(1) (2)
‘Bond kills‘ 0.02∗∗ 0.02∗∗
(0.002, 0.04) (0.002, 0.04)
Pierce −0.53∗∗ −0.42
(−1.03, −0.04) (−1.15, 0.30)
decade90 −0.16
(−0.89, 0.58)
Constant 13.05∗∗∗ 13.05∗∗∗
(12.79, 13.31) (12.78, 13.31)
Observations 23 23
R2 0.21 0.22
Adjusted R2 0.14 0.10
Residual Std. Error 0.32 (df = 20) 0.32 (df = 19)
F Statistic 2.73∗
(df = 2; 20) 1.80 (df = 3; 19)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
5. From the above results of model 2, why do you think the dummy Pierce becomes insignificant?
3
Now suppose that you run a new model with the interaction term Bond Kills:Pierce, which equals to the
product of dummy Pierce and variable Bond kills.
Table 3:
Dependent variable:
log(Gross)
(1) (2) (3)
‘Bond kills‘ 0.02∗∗ 0.02∗∗ 0.02∗∗
(0.002, 0.04) (0.002, 0.04) (0.004, 0.04)
Pierce −0.53∗∗ −0.42 0.05
(−1.03, −0.04) (−1.15, 0.30) (−1.37, 1.47)
decade90 −0.16
(−0.89, 0.58)
‘Bond kills‘:Pierce −0.02
(−0.06, 0.02)
Constant 13.05∗∗∗ 13.05∗∗∗ 13.01∗∗∗
(12.79, 13.31) (12.78, 13.31) (12.72, 13.29)
Observations 23 23 23
R2 0.21 0.22 0.24
Adjusted R2 0.14 0.10 0.12
Residual Std. Error 0.32 (df = 20) 0.32 (df = 19) 0.32 (df = 19)
F Statistic 2.73∗
(df = 2; 20) 1.80 (df = 3; 19) 2.04 (df = 3; 19)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
6. Interpret the estimated coefficient of Bond Kills:Pierce. Comparing model 1 and 3, do you think it
is a good idea to include the interaction term? Why
4
Now we turn to study the effect of Bond kills on the average rating, which ranges from 1 to 10. Considering
that each of the actors may appeal to specific group or specific generation of audiences, since each of them
may represent different time and style, we include several dummy variables in the model for each actor.
Moreover, we believe that not only does Bond kills matter, the number of people killed by others (for
instance, the supporting actors) also matters.
7. Interpret the estimated coefficients of Bond kills and Other kills. Comparing the estimated
coefficients for Bond kills and Other kills.
8. Which actor is the base category?
9. According to the estimates (ignoring significance at this moment), who was the best and who was the
worst at boosting the ratings among the 6 Bond actors?
Table 4:
Dependent variable:
Rating
‘Bond kills‘ 0.05∗∗
(0.02)
‘Others kills‘ −0.01∗∗
(0.004)
‘Bond actor‘George Lazenby 0.16
(0.66)
‘Bond actor‘Pierce Brosnan −1.82∗∗∗
(0.48)
‘Bond actor‘Roger Moore −0.76∗
(0.40)
‘Bond actor‘Sean Connery 0.53
(0.45)
‘Bond actor‘Timothy Dalton −0.69
(0.48)
Constant 6.65∗∗∗
(0.42)
Observations 23
R2 0.68
Adjusted R2 0.53
Residual Std. Error 0.51 (df = 15)
F Statistic 4.48∗∗∗ (df = 7; 15)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
5
Due 11/11 in class
Note: (1) Homework should be submitted in pdf/world format generated from RMarkdown (2) Please
include your answers, analysis, code, reasoning and the key steps, for instance the tables/plots produced by
R. Simply writing down the solution earns 0 point (3) Plagiarism is not accepted. Any similar homework
will get zero point.
Q1 [15pt]
Suppose you want to estimate the seasonal effect on the revenue. There is a constant term included in the
regression as usual. How many dummies are needed to perform such analysis?
Q2 [20pt]
Use the data in gpa2 and GPA2_description for this exercise.
1. Using all observations and regress colgpa on hsperc and sat.
2. Reestimate the model using only the first 2,070 observations
3. Find the ratio of the standard erros on hsperc from 1. and 2. what do you find? why?
4. Add female, verbmath and their interaction terms into the regression using all observations.
Q3 [20pt]
Load package ggplot2 and type data(diamonds) to load the data set. The definition of table and depth
can be found in the following picture
library(ggplot2)
dia <- diamonds
1. A diamond’s quality can be measured by cut, ordered by Ideal, Premium, Very Good, Good, and Fair.
Create dummy D1 to represent Ideal and Premium, and D2 to represent Very Good and Good.
2. Regress price on carat, depth, table, D1 and D2, all interactions terms between dummies and quantitative
variables (carat, depth and table). Interpret your result
Figure 1:
1
3. Create a random sample of size 1000 from the diamonds data. Draw the scatterplot of carat vs log(price),
color coded by cut.
4. List the distinct categories of color. What is their ordering?
Q4 [45pt] (Just Answer the Question; No R Command)
According to past series of Bond films, the average number of people that are killed by Bond shows substantial
variations among different Bond actors, as shown in the following graph. In particular, Pierce Brosnan ranks
#1 on this list. To study whether the revenue of the film are affected by the number of people that Bond
killed, we performed regression analysis on the available data set. The data are based on 23 past Bond films
with all the 6 Bond actors. For each film, we have information on the adjusted worldwide gross (in 1000
dollars), the average rating (on a 1-10 basis with 10 being the best), rating, film budget,the number of people
Bond killed and others killed in each film, bond actors and the year of the film. To start with, we build up the
following model to see if the number of people that Bond killed in the film would affect the worldwide gross,
Where log(gross) is the logarithm of the worldwide gross, Bond kills is the number of people that Bond
killed in the film, Pierce is a dummy variable indicating whether the Bond actor is Pierce. The following
table shows the estimation results.
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'bindrcpp' was built under R version 3.5.2
Table 1:
Dependent variable:
log(Gross)
‘Bond kills‘ 0.02∗∗
(0.002, 0.04)
Pierce −0.53∗∗
(−1.03, −0.04)
Constant 13.05∗∗∗
(12.79, 13.31)
Observations 23
R2 0.21
Adjusted R2 0.14
Residual Std. Error 0.32 (df = 20)
F Statistic 2.73∗
(df = 2; 20)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
Answer Q1-Q4 using the regression results above.
1. Does the number of people Bond killed significantly affect the worldwide gross at 5% level? Interpret
the estimated coefficient of Bond kills.
2. Interpret the estimated coefficient of Pierce.
3. Is the regression overall significant at 5% level?
4. What does the adjusted Rˆ2 measure?
2
Suppose you believe that the decade of 1990’s is the booming age for Bond films, so you include a time
dummy variable decade90 into the model.
Table 2:
Dependent variable:
log(Gross)
(1) (2)
‘Bond kills‘ 0.02∗∗ 0.02∗∗
(0.002, 0.04) (0.002, 0.04)
Pierce −0.53∗∗ −0.42
(−1.03, −0.04) (−1.15, 0.30)
decade90 −0.16
(−0.89, 0.58)
Constant 13.05∗∗∗ 13.05∗∗∗
(12.79, 13.31) (12.78, 13.31)
Observations 23 23
R2 0.21 0.22
Adjusted R2 0.14 0.10
Residual Std. Error 0.32 (df = 20) 0.32 (df = 19)
F Statistic 2.73∗
(df = 2; 20) 1.80 (df = 3; 19)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
5. From the above results of model 2, why do you think the dummy Pierce becomes insignificant?
3
Now suppose that you run a new model with the interaction term Bond Kills:Pierce, which equals to the
product of dummy Pierce and variable Bond kills.
Table 3:
Dependent variable:
log(Gross)
(1) (2) (3)
‘Bond kills‘ 0.02∗∗ 0.02∗∗ 0.02∗∗
(0.002, 0.04) (0.002, 0.04) (0.004, 0.04)
Pierce −0.53∗∗ −0.42 0.05
(−1.03, −0.04) (−1.15, 0.30) (−1.37, 1.47)
decade90 −0.16
(−0.89, 0.58)
‘Bond kills‘:Pierce −0.02
(−0.06, 0.02)
Constant 13.05∗∗∗ 13.05∗∗∗ 13.01∗∗∗
(12.79, 13.31) (12.78, 13.31) (12.72, 13.29)
Observations 23 23 23
R2 0.21 0.22 0.24
Adjusted R2 0.14 0.10 0.12
Residual Std. Error 0.32 (df = 20) 0.32 (df = 19) 0.32 (df = 19)
F Statistic 2.73∗
(df = 2; 20) 1.80 (df = 3; 19) 2.04 (df = 3; 19)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
6. Interpret the estimated coefficient of Bond Kills:Pierce. Comparing model 1 and 3, do you think it
is a good idea to include the interaction term? Why
4
Now we turn to study the effect of Bond kills on the average rating, which ranges from 1 to 10. Considering
that each of the actors may appeal to specific group or specific generation of audiences, since each of them
may represent different time and style, we include several dummy variables in the model for each actor.
Moreover, we believe that not only does Bond kills matter, the number of people killed by others (for
instance, the supporting actors) also matters.
7. Interpret the estimated coefficients of Bond kills and Other kills. Comparing the estimated
coefficients for Bond kills and Other kills.
8. Which actor is the base category?
9. According to the estimates (ignoring significance at this moment), who was the best and who was the
worst at boosting the ratings among the 6 Bond actors?
Table 4:
Dependent variable:
Rating
‘Bond kills‘ 0.05∗∗
(0.02)
‘Others kills‘ −0.01∗∗
(0.004)
‘Bond actor‘George Lazenby 0.16
(0.66)
‘Bond actor‘Pierce Brosnan −1.82∗∗∗
(0.48)
‘Bond actor‘Roger Moore −0.76∗
(0.40)
‘Bond actor‘Sean Connery 0.53
(0.45)
‘Bond actor‘Timothy Dalton −0.69
(0.48)
Constant 6.65∗∗∗
(0.42)
Observations 23
R2 0.68
Adjusted R2 0.53
Residual Std. Error 0.51 (df = 15)
F Statistic 4.48∗∗∗ (df = 7; 15)
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
5