辅导6CCS3PRE留学生、讲解MATLAB6CCS3PRE & 7CCSMPNN Pattern Recognition

- 首页 >> Matlab编程


6CCS3PRE & 7CCSMPNN Pattern Recognition

Coursework Assignment 1

This coursework is assessed. A type-written report needs to be submitted online through KEATS

by the deadline specified on the module's KEATS webpage. Only include in your report the

information specifically requested below: this is indicated by underlined italicised text.

Part 0: Creating Personalised Data

To answer some of the questions in this coursework you will need to create a test dataset that is

unique to you. As the answers you get will depend on the test set you use, you must ensure that

you generate the test set correctly, otherwise your answers will be wrong!

To create this test set, Xtest, use your 7-digit KCL student ID (this is the number that appears on

your College ID card, it is NOT the k-number you use to log-into College computers).

Define s1, s2, s3, s4, s5, s6, s7 to be the 1st to 7th digits of your KCL student ID number, then

define Xtest using the following MATLAB code:

Stest=[s1, s2, s3, s4, s5, s6, s7; s2, s3, s4, s5, s6, s7, s1; s3, s4, s5, s6, s7, s1, s2; s4, s5, s6, s7, s1,

s2, s3];

Stest=bsxfun(@rdivide,Stest,[2.3;4;1.5;4]);

Xtest=bsxfun(@plus,Stest,[4;2;1;0]);

Hence, if your KCL student ID was 1234567, Xtest would be:

4.4348 4.8696 5.3043 5.7391 6.1739 6.6087 7.0435

2.5000 2.7500 3.0000 3.2500 3.5000 3.7500 2.2500

3.0000 3.6667 4.3333 5.0000 5.6667 1.6667 2.3333

1.0000 1.2500 1.5000 1.7500 0.2500 0.5000 0.7500

Each column of Xtest is a sample taken from a 4-dimensional feature-space.

In your report give your 7-digit KCL student ID, and report the values in the array Xtest.

Part 1: Bayesian Decision Theory and Density Estimation

Introduction

Download file cw1_parzen_density_est.m from the module's KEATS webpage. Execute this script

in MATLAB. The output shows the parzen density estimation of a univariate dataset. The density

estimation is repeated several times using progressively more data points. You will get slightly

different results each time you run this script, due to the different random samples that are

taken each time. However, in general, as the number of samples increases, so does the accuracy

of the estimate.

To perform parzen density estimation, the cw1_parzen_density_est script calls the inbuilt

MATLAB function “mvksdensity”. For 1-d data , if each sample is an element of a row vector “d”

then the parzen density estimate of the probability, p, that a new sample would have the value

dnew, is calculated using the command: p=mvksdensity(d’,dnew’)’

Note the transpose operator in MATLAB is written using an apostrophe (e.g. x’ is the transpose

of x). It is used in the command above, as the mvksdensity expects the number of rows in the

first two input arguments to equal the number of samples, and the number of columns to equal

the number of features. Type “help mvksdensity” at the MATLAB prompt to get a full description

of this function. In general, to get a description of any function in MATLAB, you can type “help

function_name” in the MATLAB command window. Alternatively, you can use the online

documentation available here: http://uk.mathworks.com/help/

The Following vectors are samples taken from three univariate class-conditional probability

distributions .

d1=[-3.34,0.11,1.07,0.82,-0.51,-1.24,1.15,1.29,-3.38,-1.12,1.35,-0.14,1.21,-2.11,0.48,2.16,0.91,-

0.78,1.13,1.32];

d2=[3.27,4.57,4.12,4.99,4.40,4.08,5.96,3.37,4.0,3.56,4.81,3.02,3.01,2.62,3.77,7.01,2.84,2.79,4.41,2.

08,6.66,6.65,4.65,5.78,5.81,5.65,3.73,4.31,4.84,3.70,4.73,2.98,3.95,3.58];

d3=[3.66,6.16,10.07,6.43,7.17,8.17,7.33,6.24,7.02,6.52,7.27,7.86,9.27,11.58,5.12,10.12,9.07,11.57,

9.12,9.88,6.71,8.18,9.29,6.56,10.40,7.39,8.30,8.77,8.66,7.78,10.00,6.14,8.74];

Use the “mvksdensity” command to calculate the parzen window estimate of the probability

density function for all three classes over the range [-4:0.01:12], i.e. estimate p(x|wj) for j=1 to 3.

Plot these probability density estimates on a single set of axes. Note that [-4:0.01:12] produces a

row vector containing the numbers from -4 to 12 in steps of 0.01. The MATLAB function “plot”

can be used to plot the probability densities. You should get results like the following:

Note that You can specify additional parameters for mvksdensity. One parameter, “Bandwidth”,

controls the width of the window used in the parzen density estimation. If you re-calculate and

re-plot the class conditional probability densities with varying values of “Bandwidth” you will see

that larger bandwidth results in smoother estimates of the densities.

Use the number of samples in d1, d2, and d3 to estimate the prior probability, P(wj), for each

class.

Using the parzen estimates of p(x|wj) calculated using a “Bandwidth” of 0.6, and the estimates of

the prior probabilities from above, plot (on a single set of axes) estimates of p(wj|x) for all three

classes. You should get results that look like the following:

Using the parzen estimates of p(x|wj) calculated using a “Bandwidth” of 0.6, and the prior

probabilities for each class, calculate the values of p(wj|x) (for all three classes) when x takes

each of the following values: [-2,0,2,4,6,8,10]. You should get the following answer:

1.0000 0.9996 0.5591 0.0006 0.0000 0.0000 0.0000

0.0000 0.0004 0.4381 0.9375 0.4790 0.0437 0.0000

0.0000 0.0000 0.0028 0.0619 0.5210 0.9563 1.0000

Where each column shows the three posteriors for one of the 7 samples.

Using Bayes Decision Rule determine the class of new samples with values [-2,0,2,4,6,8,10]. You

should get the answer:

class =

1 1 1 2 3 3 3

Assessed Exercise

Download the file iris_class1_2_3_4D.mat from the module's KEATS webpage. Load this dataset

into MATLAB. This dataset contains 150 samples from 3 classes. Each sample is a fourdimensional

feature vector. These are the columns of X. The class label associated with each

sample is given by the corresponding element of vector t. Also calculate Xtest, as described in

Part 0.

Using methods analogous to those described in the introduction to this section (however, in this

case you will not be able to plot graphs as you are working in a four-dimensional feature space),

calculate the parzen estimates of p(x|wj) using a “Bandwidth” of 0.6, and the prior probabilities

for each class, for the dataset X. Hence, calculate the values of p(wj|x) (for all three classes) when

x takes each of the values given in Xtest.

Report the values of p(wj|x) for all three classes and all 7 test samples. Using Bayes Decision Rule,

determine the predicted class of the 7 test samples. Write down the classification of each test

sample in your report.

Part 2: k-Nearest-Neighbour Classifier

Introduction

Using the same dataset (d1, d2, and d3) and the same new samples (-2,0,2,4,6,8,10) as used in the

introduction to Part 1, determine the class of each of the new samples using the k-nearest

neighbours classifier (kNN), both for k=1 and k=5. Use the euclidean distance between the values

as your measure of similarity.

You can do this manually, but it is laborious. It is easier to write some simple code to do this for

you. MATLAB provides inbuilt functions for kNN (see “fitcknn” or “knnclassify”) or you may wish

to write your own code (in which case you may find the function “sort” useful!).

You should find that the classification of each new sample using k=1 is:

1 1 2 2 2 3 3

and for k=5, the predicted classification is:

1 1 1 2 2 3 3

Assessed Exercise

Download the file iris_class1_2_3_4D.mat from the module's KEATS webpage. Load this dataset

into MATLAB. This dataset contains 150 samples from 3 classes. Each sample is a fourdimensional

feature vector. These are the columns of X. The class label associated with each

sample is given by the corresponding element of vector t. Also calculate Xtest, as described in

Part 0.

Determine the class of each of the 7 samples in Xtest using the k-nearest neighbours classifier

(kNN), for k=3 and k=7. In your report write down the class you determined for each sample in

Xtest, for each value of k.

Part 3: Discriminant Functions

Assessed Exercise

Answer tutorial question 14 in the section of the tutorial on Discriminant Functions, except use a

margin vector b = [s1, s2, s3, s4, s5, s6]t

, where s1, s2, s3, s4, s5, s6 are the first six digits of your

7-digit KCL student ID.

This question is about using the Sequential Widrow-Hoff Learning Algorithm to find a linear

discriminant function to classify the data given in the table in question 12 of the same section of

the tutorial. You can do this manually, but it is laborious and error-prone. It may be easier for you

to write some simple code to do this for you.

In your report provide a table showing the calculations performed at each iteration. Note, if you fail

to use your student ID to define margin vector b, you will receive a mark of zero, even if the

method you use is correct.




站长地图