代做BUSI 395 HW10调试Python程序
- 首页 >> WebBUSI 395 HW10
Due Date: Nov 26th, 2024 11:59PM
Instructions
Please submit to Gradescope and tag all your answers to each question. Untagged questions may cause a delay in grading.
Question 0. Honor Code (1 point)
Write down the honor code pledge and tag the resources that you have used, and with whom you have discussed your homework.
Question 1. True/False and Multiple Choice Questions (6 points)
1. Examine the Q-Q plot below, where the sample quantiles are plotted against the theoretical quantiles of a standard normal distribution.
Figure 1
Based on the plot, which of the following best describes the data distribution?
A. The data is symmetric and has heavy tails.
B. The data is left-skewed with light tails.
C. The data is right-skewed with heavy tails.
D. The data is symmetric with light tails.
2. Which of the following is true about adjusted R2?
A. It always equals R2.
B. It increases as predictors are added, regardless of their contribution to the model.
C. It adjusts R2 to account for the number of predictors and penalizes unnecessary variables.
D. It decreases as the number of observations increases.
3. When selecting between models, which criterion penalizes the model for having too many predictors?
A. Residual Standard Error (RSE)
B. Adjusted R2
C. R2
D. F-statistic
Question 2. Gross Earnings of Movies (8 points)
A motion picture industry analyst wants to estimate the gross earnings generated by a movie. The estimate will be based on different variables involved in the film’s production. The independent variables considered are:
• X1 (COST): Production cost of the movie (in millions of dollars).
• X2 (PROM): Total costs of all promotional activities (in millions of dollars).
A third variable that the analyst wants to consider is the qualitative variable of whether or not the movie is based on a book published before the release of the movie. This third qualitative variable is handled using an indicator variable:
The analyst obtains information on a random sample of 20 Hollywood movies made within the last five years (the inference is to be made only about the population of movies in this particular category). The data are summarized in Table 1. The dependent variable Y (EARN) represents the gross earnings of the movie (in millions of dollars).
The regression model can be presented as:
Predicted EARN = a + b1 × COST + b2 × PROM + b3 × BOOK + error term.
The regression output is shown in Table 2 below.
1. How useful is the model overall? Are all three independent variables relevant? (2 points)
2. What gross earnings does the model predict for a movie costing nothing to produce or promote, and that is not based on a book? How meaningful is this figure? (3 points)
3. An authors’ association claims that the existence of a book increases gross earnings on average by at least $7.5 million. Can you reject this hypothesis? (3 points)
Table 1: Data on Earnings
Table 2: Regression Results: Earnings on COST, PROM, and BOOK
Question 3. Production Time Analysis: Erie Steel Ltd (8 points)
Erie Steel Ltd has just accepted an order to produce 500 pieces of a new component. Each piece will require 7 operations in the production process. The product manager, Roger Blough, has promised delivery within three weeks, which means that the production time between starting the job and having the batch ready for shipping would have to be no more than 15 days (360 hours), assuming it could be started at 10 am tomorrow.
“Can we start this order tomorrow at 10 am, Patricia, and can it be done within 360 hours?” Roger inquired of his production scheduler, Patricia Williams.
“Yes, there’s no problem with starting at 10 am tomorrow, but I do not know whether we will be able to finish it in 360 hours. We have not done a job exactly like this before. If you are worried, why don’t you designate it as a ‘rush’ order? We save an average of 50 hours through having a ‘progress-chaser’ assigned to an order.”
Roger was reluctant to commit himself to the ‘rush’ designation. The allocation of a ‘progress chaser’ would cost an extra $1,000, and he was not convinced that it would actually make any difference. He had some data on the previous 20 orders of a similar nature (Table 3) and decided to see if he could somehow estimate the required time.
Table 3: Data on Orders
TIME = time to complete the job
PIECES = number of pieces in the job
PS = number of operations per piece
RUSH = a dummy variable equal to 1 if the job is a ‘rush’
The results of a regression analysis of the data are shown below:
Table 4: Regression Results
R2 (Adjusted) = 0.8067, Standard Error of Estimate = 88.909, 20 observations fitted.
1. What is the estimated model (equation) that relates production time to the number of pieces in an order, the number of operations per piece, and whether they were “rushed”? What is the average effect of designating an order as a “rush”? (2 points)
2. Can you reject Patricia’s claim that the average effect of a ‘rush’ is to reduce the time by 50 hours at the 5% significance level? (3 points)
3. Can you refute Roger’s claim that the ‘rush’ designation makes no difference at the 5% significance level? (3 points)
Question 4. Lab-Related Questions (20 points)
Instructions: Please see the Regressions.iypnb file and the data file 23 4m subprime.csv to answer the following questions:
1. Produce the pairsplot of the data and turn this in.
2. Produce the histogram and the QQ plot of the response variable and turn them in.
3. Turn in the correlation matrix.
4. Estimate the regression model. Turn in your code. (It should be one line!)
5. Turn in the big summary table. Probably the easiest way to do this is to turn in a screen shot of the table.
6. Write down (or type) the estimated equation
7. Extract the R2, the adjusted R2, and the RMSE values and display them using the print function. Turn in the code and the outputs.
8. Extract the fitted values, and the residuals from your model. Plot the fitted values (on the x axis) versus the residuals (on the y-axis). Turn in the code and the plot.
9. Produce the histogram, boxplot, and QQ plot of the residuals. Turn in the plots.
10. What is the predicted APR for an individual with: LTV = 0.5, Credit Score = 600, Stated Income = 80 (80k), and Home Value = 250 (250k)? Do NOT do this by hand; code it in Python. Turn in your code and the output. Please think it through.