BIA B452F讲解、辅导business data、辅导R编程设计、R程序调试

- 首页 >> 其他
BIA B452F Assignment 2
Weighting: 20%
Due Date: 23 April 2019 (Tuesday)
Learning outcome:
Explain and select analytic techniques for business intelligence and big data analysis.
Apply data visualization tools and predictive analytics to summarize and analyze business data.
Important note
You should note that there might not be a single correct answer to the questions. Your answers to
these questions may be different from each other and could all be equally valid.
Classification of Tumors
This assignment uses a dataset available from the UCI Machine Learning Repository, which was obtained
from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg, and is known as
“Wisconsin Breast Cancer Original” (BreastCancer.csv). The attributes of the dataset are as follows:
Attribute Description
Id Sample code number: id number
Cl.thickness Clump Thickness: 1 - 10
Cell.size Uniformity of Cell Size: 1 - 10
Cell.shape Uniformity of Cell Shape: 1 - 10
Marg.adhesion Marginal Adhesion: 1 - 10
Epith.c.size Single Epithelial Cell Size: 1 - 10
Bare.nuclei Bare Nuclei: 1 - 10
Bl.cromatin Bland Chromatin: 1 - 10
Normal.nucleoli Normal Nucleoli: 1 - 10
Mitoses Mitoses: 1 - 10
Class Class: benign / malignant
The aim of the classification is to distinguish between benign and malignant cancers based on the available
measurements (attributes): clump thickness, uniformity of cell size, uniformity of cell shape, Marginal
Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli and Mitoses.
Tasks
1. Perform an exploratory analysis on the data. (10 marks)
2. Develop a SVM model to classify the data. (35 marks)
3. Develop a Neural Network model to classify the data. (35 marks)
4. Evaluate and compare the performance of the SVM and Neural Network models. (10 marks)
5. Write a brief report to explain in details your model building process and disseminate your analysis
results and findings. (10 marks)Grading Criteria
Each submission will be graded based on the data analysis and classification models. Here are our grading
criteria:
Appropriate data preparation and exploration.
Appropriate model training, validation and testing.
Performance of the classification models.
data and presenting
the analysis results.
Clearly written, understandable captions that communicate model building process, analysis results
and findings.
Submission Details
This is an individual assignment. You may NOT work in groups. Your completed analysis report for the
assignment should be uploaded to OLE (Assignment 2) for Turnitin checking. The R programs and datasets
should be zipped and uploaded to OLE (Assignment 2.1) before deadline. Printed copy of the analysis
report must be submitted to tutor’s assignment box at 8/F. Block A on or before 5:00 p.m. of the given due
date.