讲解C1 628 137、辅导R编程设计、辅导dataset留学生、讲解R编程语言
- 首页 >> OS编程 Assignment 5
Due: 3/6
Note: Show all your work.
Problem 1 (10 points) Consider the following confusion matrix.
predicted class
actual class
C1 C2
C1 628 137
C2 59 394
Note: C1 is positive and C2 is negative.
Compute sensitivity, specificity, precision, accuracy, F-meassure, and F2.
Problem 2 (10 points) Suppose you built two classifier models M1 and M2 from the
same training dataset and tested them on the same test dataset using 10-fold crossvalidation.
The error rates obtained over 10 iterations (in each iteration the same
training and test partitions were used for both M1 and M2) are given in the table
below. Determine whether there is a significant difference between the two models
using the statistical method discussed in Section 6 of the online lecture Module 4 (also
in Section 8.5.5, pp 372-373 of the textbook). Use a significance level of 1%. If there
is a significant difference, which one is better?
Iteration M1 M2
1 0.21 0.13
2 0.12 0.1
3 0.09 0.20
4 0.15 0.2
5 0.03 0.15
6 0.07 0.05
7 0.13 0.14
8 0.14 0.21
9 0.05 0.23
10 0.14 0.17
Note: When you calculate var(M1 – M2), calculate a sample variance (not a
population variance).
Problem 3 (20 points). For this problem, you are required to run, on Weka, Native
Bayes, J48, SimpleLogistic, RandomForest, neural network (Multilayer Perceptron),
and One R classification algorithms on german-bank.arff dataset and compare the
performance of the models built by these six algorithms. Make sure that you select
“Cross-validation” for “Test options.” If you have to choose one model, which one would you choose and why? Note that the neural network algorithm will take a longer
time than other algorithms.
Due: 3/6
Note: Show all your work.
Problem 1 (10 points) Consider the following confusion matrix.
predicted class
actual class
C1 C2
C1 628 137
C2 59 394
Note: C1 is positive and C2 is negative.
Compute sensitivity, specificity, precision, accuracy, F-meassure, and F2.
Problem 2 (10 points) Suppose you built two classifier models M1 and M2 from the
same training dataset and tested them on the same test dataset using 10-fold crossvalidation.
The error rates obtained over 10 iterations (in each iteration the same
training and test partitions were used for both M1 and M2) are given in the table
below. Determine whether there is a significant difference between the two models
using the statistical method discussed in Section 6 of the online lecture Module 4 (also
in Section 8.5.5, pp 372-373 of the textbook). Use a significance level of 1%. If there
is a significant difference, which one is better?
Iteration M1 M2
1 0.21 0.13
2 0.12 0.1
3 0.09 0.20
4 0.15 0.2
5 0.03 0.15
6 0.07 0.05
7 0.13 0.14
8 0.14 0.21
9 0.05 0.23
10 0.14 0.17
Note: When you calculate var(M1 – M2), calculate a sample variance (not a
population variance).
Problem 3 (20 points). For this problem, you are required to run, on Weka, Native
Bayes, J48, SimpleLogistic, RandomForest, neural network (Multilayer Perceptron),
and One R classification algorithms on german-bank.arff dataset and compare the
performance of the models built by these six algorithms. Make sure that you select
“Cross-validation” for “Test options.” If you have to choose one model, which one would you choose and why? Note that the neural network algorithm will take a longer
time than other algorithms.