讲解AD699、辅导R程序语言、讲解Business Analytics、R设计辅导

- 首页 >> Algorithm 算法

AD699: Data Mining for Business Analytics

Fall 2018

Homework #5

Topic: Clustering

Due by 11:59 p.m. on Monday, 03DEC

Task: k-means clustering

The dataset Cereals.csv contains nutritional information, store display, and consumer ratings for 77 breakfast

cereals. Descriptions of the variables can be found in a text file that accompanies this assignment prompt.

I. Read this dataset into your R environment. Show the steps that you used to accomplish this.

II. Remove all cereals with missing values. Show the steps that you used to accomplish this.

III. Should this data be normalized? Why or why not? If so, normalize your data,and show the steps that

you took in order to make this happen.

IV. Use the kmeans algorithm to separate the breakfast cereals into clusters. To determine the optimal

number of clusters to use, consider using an elbow chart, or another means of analysis of your

preference. (Figure 15.6 in our textbook shows an elbow chart -- the textbook does not provide

template code, but a quick online search will very quickly yield sample/template code for an elbow

chart).

V. The local elementary school has asked that you identify the healthiest cluster from among the clusters

that you’ve found. Which cluster will you select, and why? Which cereals are in this cluster?


站长地图