DATA5100讲解、辅导R、讲解Data Mining、辅导R程序语言
- 首页 >> 其他 DATA5100: Data Mining: R Programming
Short Project 2
Scenario
The only data researcher/chemical analyst has resigned at the Blane Research Company. Prior to this person resigning, the wine dataset results from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars was under analysis for research recommendations for local growers in the same region. The analysis determined the quantities of 14 constituents found in each of the three types of wines.
Today is your first day as a data researcher/chemical analyst at the Blane Research Company. To get you up to speed, your supervisor has directed you to take the wine.csv file and analyze it using R.
Assignment Instructions
Download the wine.csv file from ulearn. Follow the steps below and take screenshots of your output and place in a word document, with a full description of each screenshot taken. When you are complete with the assignment, name your file ShortProject2.doc then submit via the Short Project 2 submission link.
The attributes are as follows:
1) Class
2) Alcohol
3) Malic acid
4) Ash
5) Alcalinity of ash
6) Magnesium
7) Phenolsx
8) Flavanoids
9) Nonflavanoid phenols
10) Proanthocyanins
11) Color intensity
12) Hue
13) OD280/OD315 of diluted wines
14) Proline
1) Load data into R
1.Open R Studio and set the directory where you have saved wine.csv as the working directory.
2.Load the wine.csv into an R object named data.
2) Visualize descriptive statistics with the boxplot and histogram
1.Create a standard boxplot of the Alcohol, Malic Acid, Ash, and Alcalinity columns in data.
2.Create a modified boxplot of the data where the line and box represent the same values as in the standard boxplot, but the whiskers represent 1.5xIQR from the 1st and 3rd quartiles. Outliers are then any values that fall beyond this point and are shown as a circle.
3.Create a histogram of the Ash column in data.
4.Create a histogram of the Ash column in data but use 100 breaks.
5.Plot a kernel density plot of the Ash column in data.
3) Explore and Compare Multiple Variables
1.Compute the covariance of the Ash and Alcohol columns in data.
2.Create a covariance matrix of the Ash and Alcohol columns in data.
3.Create a scatterplot of the data.
4) Find Similar Data Objects
1.Create a distance matrix of data.
Short Project 2
Scenario
The only data researcher/chemical analyst has resigned at the Blane Research Company. Prior to this person resigning, the wine dataset results from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars was under analysis for research recommendations for local growers in the same region. The analysis determined the quantities of 14 constituents found in each of the three types of wines.
Today is your first day as a data researcher/chemical analyst at the Blane Research Company. To get you up to speed, your supervisor has directed you to take the wine.csv file and analyze it using R.
Assignment Instructions
Download the wine.csv file from ulearn. Follow the steps below and take screenshots of your output and place in a word document, with a full description of each screenshot taken. When you are complete with the assignment, name your file ShortProject2.doc then submit via the Short Project 2 submission link.
The attributes are as follows:
1) Class
2) Alcohol
3) Malic acid
4) Ash
5) Alcalinity of ash
6) Magnesium
7) Phenolsx
8) Flavanoids
9) Nonflavanoid phenols
10) Proanthocyanins
11) Color intensity
12) Hue
13) OD280/OD315 of diluted wines
14) Proline
1) Load data into R
1.Open R Studio and set the directory where you have saved wine.csv as the working directory.
2.Load the wine.csv into an R object named data.
2) Visualize descriptive statistics with the boxplot and histogram
1.Create a standard boxplot of the Alcohol, Malic Acid, Ash, and Alcalinity columns in data.
2.Create a modified boxplot of the data where the line and box represent the same values as in the standard boxplot, but the whiskers represent 1.5xIQR from the 1st and 3rd quartiles. Outliers are then any values that fall beyond this point and are shown as a circle.
3.Create a histogram of the Ash column in data.
4.Create a histogram of the Ash column in data but use 100 breaks.
5.Plot a kernel density plot of the Ash column in data.
3) Explore and Compare Multiple Variables
1.Compute the covariance of the Ash and Alcohol columns in data.
2.Create a covariance matrix of the Ash and Alcohol columns in data.
3.Create a scatterplot of the data.
4) Find Similar Data Objects
1.Create a distance matrix of data.