辅导6CCS3PRE留学生、讲解MATLAB6CCS3PRE & 7CCSMPNN Pattern Recognition
- 首页 >> Matlab编程6CCS3PRE & 7CCSMPNN Pattern Recognition
Coursework Assignment 1
This coursework is assessed. A type-written report needs to be submitted online through KEATS
by the deadline specified on the module's KEATS webpage. Only include in your report the
information specifically requested below: this is indicated by underlined italicised text.
Part 0: Creating Personalised Data
To answer some of the questions in this coursework you will need to create a test dataset that is
unique to you. As the answers you get will depend on the test set you use, you must ensure that
you generate the test set correctly, otherwise your answers will be wrong!
To create this test set, Xtest, use your 7-digit KCL student ID (this is the number that appears on
your College ID card, it is NOT the k-number you use to log-into College computers).
Define s1, s2, s3, s4, s5, s6, s7 to be the 1st to 7th digits of your KCL student ID number, then
define Xtest using the following MATLAB code:
Stest=[s1, s2, s3, s4, s5, s6, s7; s2, s3, s4, s5, s6, s7, s1; s3, s4, s5, s6, s7, s1, s2; s4, s5, s6, s7, s1,
s2, s3];
Stest=bsxfun(@rdivide,Stest,[2.3;4;1.5;4]);
Xtest=bsxfun(@plus,Stest,[4;2;1;0]);
Hence, if your KCL student ID was 1234567, Xtest would be:
4.4348 4.8696 5.3043 5.7391 6.1739 6.6087 7.0435
2.5000 2.7500 3.0000 3.2500 3.5000 3.7500 2.2500
3.0000 3.6667 4.3333 5.0000 5.6667 1.6667 2.3333
1.0000 1.2500 1.5000 1.7500 0.2500 0.5000 0.7500
Each column of Xtest is a sample taken from a 4-dimensional feature-space.
In your report give your 7-digit KCL student ID, and report the values in the array Xtest.
Part 1: Bayesian Decision Theory and Density Estimation
Introduction
Download file cw1_parzen_density_est.m from the module's KEATS webpage. Execute this script
in MATLAB. The output shows the parzen density estimation of a univariate dataset. The density
estimation is repeated several times using progressively more data points. You will get slightly
different results each time you run this script, due to the different random samples that are
taken each time. However, in general, as the number of samples increases, so does the accuracy
of the estimate.
To perform parzen density estimation, the cw1_parzen_density_est script calls the inbuilt
MATLAB function “mvksdensity”. For 1-d data , if each sample is an element of a row vector “d”
then the parzen density estimate of the probability, p, that a new sample would have the value
dnew, is calculated using the command: p=mvksdensity(d’,dnew’)’
Note the transpose operator in MATLAB is written using an apostrophe (e.g. x’ is the transpose
of x). It is used in the command above, as the mvksdensity expects the number of rows in the
first two input arguments to equal the number of samples, and the number of columns to equal
the number of features. Type “help mvksdensity” at the MATLAB prompt to get a full description
of this function. In general, to get a description of any function in MATLAB, you can type “help
function_name” in the MATLAB command window. Alternatively, you can use the online
documentation available here: http://uk.mathworks.com/help/
The Following vectors are samples taken from three univariate class-conditional probability
distributions .
d1=[-3.34,0.11,1.07,0.82,-0.51,-1.24,1.15,1.29,-3.38,-1.12,1.35,-0.14,1.21,-2.11,0.48,2.16,0.91,-
0.78,1.13,1.32];
d2=[3.27,4.57,4.12,4.99,4.40,4.08,5.96,3.37,4.0,3.56,4.81,3.02,3.01,2.62,3.77,7.01,2.84,2.79,4.41,2.
08,6.66,6.65,4.65,5.78,5.81,5.65,3.73,4.31,4.84,3.70,4.73,2.98,3.95,3.58];
d3=[3.66,6.16,10.07,6.43,7.17,8.17,7.33,6.24,7.02,6.52,7.27,7.86,9.27,11.58,5.12,10.12,9.07,11.57,
9.12,9.88,6.71,8.18,9.29,6.56,10.40,7.39,8.30,8.77,8.66,7.78,10.00,6.14,8.74];
Use the “mvksdensity” command to calculate the parzen window estimate of the probability
density function for all three classes over the range [-4:0.01:12], i.e. estimate p(x|wj) for j=1 to 3.
Plot these probability density estimates on a single set of axes. Note that [-4:0.01:12] produces a
row vector containing the numbers from -4 to 12 in steps of 0.01. The MATLAB function “plot”
can be used to plot the probability densities. You should get results like the following:
Note that You can specify additional parameters for mvksdensity. One parameter, “Bandwidth”,
controls the width of the window used in the parzen density estimation. If you re-calculate and
re-plot the class conditional probability densities with varying values of “Bandwidth” you will see
that larger bandwidth results in smoother estimates of the densities.
Use the number of samples in d1, d2, and d3 to estimate the prior probability, P(wj), for each
class.
Using the parzen estimates of p(x|wj) calculated using a “Bandwidth” of 0.6, and the estimates of
the prior probabilities from above, plot (on a single set of axes) estimates of p(wj|x) for all three
classes. You should get results that look like the following:
Using the parzen estimates of p(x|wj) calculated using a “Bandwidth” of 0.6, and the prior
probabilities for each class, calculate the values of p(wj|x) (for all three classes) when x takes
each of the following values: [-2,0,2,4,6,8,10]. You should get the following answer:
1.0000 0.9996 0.5591 0.0006 0.0000 0.0000 0.0000
0.0000 0.0004 0.4381 0.9375 0.4790 0.0437 0.0000
0.0000 0.0000 0.0028 0.0619 0.5210 0.9563 1.0000
Where each column shows the three posteriors for one of the 7 samples.
Using Bayes Decision Rule determine the class of new samples with values [-2,0,2,4,6,8,10]. You
should get the answer:
class =
1 1 1 2 3 3 3
Assessed Exercise
Download the file iris_class1_2_3_4D.mat from the module's KEATS webpage. Load this dataset
into MATLAB. This dataset contains 150 samples from 3 classes. Each sample is a fourdimensional
feature vector. These are the columns of X. The class label associated with each
sample is given by the corresponding element of vector t. Also calculate Xtest, as described in
Part 0.
Using methods analogous to those described in the introduction to this section (however, in this
case you will not be able to plot graphs as you are working in a four-dimensional feature space),
calculate the parzen estimates of p(x|wj) using a “Bandwidth” of 0.6, and the prior probabilities
for each class, for the dataset X. Hence, calculate the values of p(wj|x) (for all three classes) when
x takes each of the values given in Xtest.
Report the values of p(wj|x) for all three classes and all 7 test samples. Using Bayes Decision Rule,
determine the predicted class of the 7 test samples. Write down the classification of each test
sample in your report.
Part 2: k-Nearest-Neighbour Classifier
Introduction
Using the same dataset (d1, d2, and d3) and the same new samples (-2,0,2,4,6,8,10) as used in the
introduction to Part 1, determine the class of each of the new samples using the k-nearest
neighbours classifier (kNN), both for k=1 and k=5. Use the euclidean distance between the values
as your measure of similarity.
You can do this manually, but it is laborious. It is easier to write some simple code to do this for
you. MATLAB provides inbuilt functions for kNN (see “fitcknn” or “knnclassify”) or you may wish
to write your own code (in which case you may find the function “sort” useful!).
You should find that the classification of each new sample using k=1 is:
1 1 2 2 2 3 3
and for k=5, the predicted classification is:
1 1 1 2 2 3 3
Assessed Exercise
Download the file iris_class1_2_3_4D.mat from the module's KEATS webpage. Load this dataset
into MATLAB. This dataset contains 150 samples from 3 classes. Each sample is a fourdimensional
feature vector. These are the columns of X. The class label associated with each
sample is given by the corresponding element of vector t. Also calculate Xtest, as described in
Part 0.
Determine the class of each of the 7 samples in Xtest using the k-nearest neighbours classifier
(kNN), for k=3 and k=7. In your report write down the class you determined for each sample in
Xtest, for each value of k.
Part 3: Discriminant Functions
Assessed Exercise
Answer tutorial question 14 in the section of the tutorial on Discriminant Functions, except use a
margin vector b = [s1, s2, s3, s4, s5, s6]t
, where s1, s2, s3, s4, s5, s6 are the first six digits of your
7-digit KCL student ID.
This question is about using the Sequential Widrow-Hoff Learning Algorithm to find a linear
discriminant function to classify the data given in the table in question 12 of the same section of
the tutorial. You can do this manually, but it is laborious and error-prone. It may be easier for you
to write some simple code to do this for you.
In your report provide a table showing the calculations performed at each iteration. Note, if you fail
to use your student ID to define margin vector b, you will receive a mark of zero, even if the
method you use is correct.