STA 141A讲解、辅导R编程设计、讲解R、Canvas留学生辅导 讲解Java程序|辅导Python编程

- 首页 >> 其他
STA 141A, Homework 3
Due May 21st 2019 (by 8 am)

Name:

Student ID:

Section:

Names of your study mates:

Please submit on Canvas, in a compiled R-markdown file (to pdf or html).

All code in this assignment should be cleanly written and well commented, with appropriate use of functions/arguments. Imagine you are sending this code to your colleagues or supervisors for review—which they can only do if they can understand it.

In this homework, you will compare k-nearest neighbour (kNN) classification method, linear discriminant analysis (LDA) and logistic regression in a two-class classification problem.

This homework is designed to exercise your self-learning ability. Note that LDA and kNN are not discussed in the lectures. You are expected to use the Internet or other resources to learn how to apply these basic methods on a data set. You will find the lda() function (in MASS library) and the knn() function (in class library) useful.

Consider the iris data.

Extract the data corresponding to the flower types versicolor and virginica, numbering a total of 100100 flowers. Set aside the first 1010 observations for each flower type as test data and use the remaining data consisting of 80 observations (with flower types as class labels) as training data.

Use Linear Discriminant Analysis (LDA) for classifying the test data. Use Sepal.Length and Sepal.Width as the predictor variables (or features).
Report the class-specific means of the predictor variables for the training data.
Compute the confusion matrix for the test data, and the misclassification error rate.
Fit a logistic regression model to the training data, using the variables Sepal.Length and Sepal.Width as predictors.
Obtain the estimates and their standard errors for the model parameters.
Compute the confusion matrix for the test data, and the misclassification error rate.
Are both the predictor variables necessary for the purpose of classification? Justify.
Fit a logistic regression model to the training data, using the variable Sepal.Length as a one-dimensional predictor.
Obtain the estimates and their standard errors for the model parameters.
Compute the confusion matrix for the test data, and the misclassification error rate.
Compare the results with those in Question 3. Does your result in 4.b support the answer to 3.c?
Use the k-Nearest Neighbors (kNN) classification method to classify the test data, using only Sepal.Length as the predictor variable. Perform this analysis using k=1k=1 and k=5k=5. In each case, compute the confusion matrix for the test data, and the misclassification error rate.

Write a very brief summary (maximum of 200 words) about the comparative performance of the three different classification methods for this data set.