Project II讲解、R辅导、辅导MyData留学生、讲解R程序语言

- 首页 >> 其他
Project II: Using R for Chapter 5
Goal: Exploring Correlation and Regression
In this project, you will use graphical and statistical methods to describe relationships between the data gathered with your class on the color of Skittles in a package. This is a cumulative project, so make sure to keep a copy of your work as you go along.
Step 1. Be sure to get the Excel file that has the results from our class. Be sure to save this file and any other files you may need on your computer.
Step 2. Open RStudio and import the data.
Step 3. Let’s get ready to work with our file by attaching the data. If you called the data MyData you can do this by typing attach(MyData)
Let’s begin the assignment!
1.Import and attach our data on the number of colored Skittles per bag.
2.Install and run the package called “corrplot”.
3.Create the correlation table and a correlation plot with the frequency of each color and the frequency of the total number of candies in each bag. Copy and paste the correlation table from RStudio under 1 a) below. Copy and paste the correlation plot under 1 b) below. (5 points)
4.Answer the following two questions that are given under 2) below (2 points)
5.We are interested in comparing the frequency of the color per bag (our independent variable) with the total number of candies per bag (our dependent variable). For each color do the following:
a.Create a picture that has the scatterplot on the left and the residual plot on the right. (1 point per picture)
b.On your scatterplot you should have a title, labeled axes, and a regression line. (3 points per picture)
c.On your residual plot you should have a title and labeled axes. (2 points per picture)
Once you have all five pictures, copy and paste them under the correct headings of 3) below. (30 points)
6.Find the summary of each of the scatterplots (the entire output of lm(dependent variable ~ independent variable) ) which tells you what is the regression line, the standard error, and the coefficient of determination. Copy and paste the summary for each scatterplot under the correct headings of 4) below (10 points)
7.For each regression line, find the predicted number of Skittles in a bag using the values from your sample and calculate the residual. Enter in the number of each colored Skittle and the predicted value in the corresponding places of 5) below. Round your predicted value and residual to four decimal places. (5 points)
8.Answer the following questions/prompts on this data under the Questions section below. Limit all responses to 5 sentences. (8 points)
9.Save this document either as a Word Document (.docx file) or PDF (.pdf file) with the name “Last Name, First Name – Project II”. Submit your finished project using the Dropbox link from Project I. Project II will be due at the beginning of class on March 27th.

R Output:
1.a) Copy and paste your correlation table from Step #4 below:
b) Copy and paste your correlation plot from Step #4 below:
2.a) Which two variables have the strongest positive correlation? (Hint: the two variables cannot be the same
b) Which two variables have the strongest negative correlation?
3.a) Scatterplot and Residual Plot of Frequency of Red Skittles vs. Total Skittles in Bag
b) Scatterplot and Residual Plot of Frequency of Orange Skittles vs. Total Skittles in Bag
c) Scatterplot and Residual Plot of Frequency of Yellow Skittles vs. Total Skittles in Bag
d) Scatterplot and Residual Plot of Frequency of Green Skittles vs. Total Skittles in Bag
e) Scatterplot and Residual Plot of Frequency of Purple Skittles vs. Total Skittles in Bag

4.a) Summary of Frequency of Red Skittles vs. Frequency of Total Skittles

b) Summary of Frequency of Orange Skittles vs. Frequency of Total Skittles

c) Summary of Frequency of Yellow Skittles vs. Frequency of Total Skittles


d) Summary of Frequency of Green Skittles vs. Frequency of Total Skittles


e) Summary of Frequency of Purple Skittles vs. Frequency of Total Skittles

5.
Color Color Frequency
in Sample Total Frequency
in Sample Predicted Value ()
(round to 4 decimal places) Residual
(round to 4 decimal places)
Red
Orange
Yellow
Green
Purple

Questions:
1.Write a conclusion about what the correlation coefficient means for the scatterplot comparing the number of red Skittles and the total number of Skittles in a bag. Does this mean that our regression line is a good way to estimate how many Skittles are in a pack given how many of them are red? Explain.

2.Write a conclusion about the differences between the predicted total of Skittles in a bag with the total Skittles in your sample. Was there anything interesting? Explain.

3.BONUS QUESTION: The answer must be perfect to get any points.
a.Write a sentence about what the coefficient of determination means for the scatterplot that had the highest coefficient of determination. Write a sentence about what the coefficient of determination means for the scatterplot that had the lowest coefficient of determination. Be sure to specify in the sentence which plot the coefficient of determination belongs to. (6 bonus points)

look at the coefficient of determination and standard error for each scatterplot. If you had to pick one measure, which one would you pick to best describe the data in general. Why? (4 points)