讲解AD699、辅导R程序语言、讲解Business Analytics、R设计辅导
- 首页 >> Algorithm 算法AD699: Data Mining for Business Analytics
Fall 2018
Homework #5
Topic: Clustering
Due by 11:59 p.m. on Monday, 03DEC
Task: k-means clustering
The dataset Cereals.csv contains nutritional information, store display, and consumer ratings for 77 breakfast
cereals. Descriptions of the variables can be found in a text file that accompanies this assignment prompt.
I. Read this dataset into your R environment. Show the steps that you used to accomplish this.
II. Remove all cereals with missing values. Show the steps that you used to accomplish this.
III. Should this data be normalized? Why or why not? If so, normalize your data,and show the steps that
you took in order to make this happen.
IV. Use the kmeans algorithm to separate the breakfast cereals into clusters. To determine the optimal
number of clusters to use, consider using an elbow chart, or another means of analysis of your
preference. (Figure 15.6 in our textbook shows an elbow chart -- the textbook does not provide
template code, but a quick online search will very quickly yield sample/template code for an elbow
chart).
V. The local elementary school has asked that you identify the healthiest cluster from among the clusters
that you’ve found. Which cluster will you select, and why? Which cereals are in this cluster?