代写CSE 5523: Machine Learning - Homework #3帮做R语言

2024.03.24 - 首页 >> C/C++编程

CSE 5523: Machine Learning - Homework #3

Due: 11:59pm 04/02/2024

Homework Policy: Please submit your solutions in a single PDF file named HW 1 name.number.pdf (e.g., HW 1 zhang.12807.pdf) to Carmen . You may write your solutions on paper and scan it, or directly type your solutions and save them as a PDF file. Submission in any other format will not be graded. Working in groups is fine, but each member must submit their own writeup. Please write the members of your group on your solutions. For coding problems, please append your code to your submission and report your results (values, plots, etc.) in your written solution. You will lose points if you only include them in your code submissions.

1) Kernelized Ridge Regression [30 points].

Recall that the error function for ridge regression (linear regression with L2 regularization) is:

E(w) = (Φw − y)T (Φw − y) + λwTw

and its closed-form solution and model are:

w(ˆ) = (ΦT Φ + λI)−1ΦTy and f(ˆ)(x) =w(ˆ)T ϕ(x) = yT Φ(ΦT Φ + λI)−1ϕ(x)

Now we want to kernelize ridge regression and allow non-linear models.

(a) Use the following matrix inverse lemma to derive the closed-form solution and model for kernelized ridge regression:

(P + QRS)−1 = P −1 − P−1Q(R−1 + SP−1Q)−1SP −1

where P is an n × n invertible matrix, R is a k × k invertible matrix, Q is an n × k matrix and S is a k × n matrix. Make sure that your kernelized model only depends on the feature vectors ϕ(x) through inner products with other feature vectors.

Hint: you may apply matrix inverse lemma by letting P = λI , Q = ΦT , R = I and S = Φ

(b) Apply kernelized ridge regression to the steel ultimate tensile strength dataset. The training data and test data are provided in steel composition train.csv and steel composition test.csv , respectively. We recommend you to normalize the data before applying the models. Report the RMSE (Root Mean Square Error) of the models on the training data. Try (set λ = 1)

(i) Polynomial kernel k(u, v) = (⟨u, v⟩ + 1)2

(ii) Polynomial kernel k(u, v) = (⟨u, v⟩ + 1)3

(iii) Polynomial kernel k(u, v) = (⟨u, v⟩ + 1)4

(iv) Gaussian kernel k(u, v) = exp ( − (set σ = 1)

2) Graphical Models [20 points].

In this problem, you will explore the independence properties of directed graphical models and practice translating them to factored probability distributions and back.

(a) Draw a directed graphical model for each of the following factored distributions. Take advantage of plate notation when convenient, and represent as many independencies with your graph as possible (i.e., don’t draw a fully connected graph!).

(i) P(y1 , y2 , y3 , y4 , y5 ) = P(y1 )P(y2 |y1 ) “k(5)=3 P(yk |yk−1,yk−2)

(ii) P(x1 ,..., xN , y1 ,..., yN ) = P(y1 ) “ P(yk |yk−1) “ P(xk |yk )

(b) For each directed model below, write down the factorized joint distribution over all variables.

3) Clustering [50 points].

Download the image mandrill.png from Carmen. In this problem you will apply the k-means algorithm to image compression. In this context it is also known as the Lloyd-Max algorithm.

(a) First, partition the image into blocks of size M × M and reshape each block into a vector of length 3M2 (see hw3p3.py). The 3 comes from the fact that this is a color image, and so there are three intensities for each pixel. Assume that M, like the image dimensions, is a power of 2.

Next, write a program that will cluster the vectors from (a) using the k-means algorithm. You should implement the k-means algorithm yourself. Please initialize the cluster means to be randomly selected data points, sampled without replacement.

Finally, reconstruct a quantized version of the original image by replacing each block in the original

image by the nearest centroid. Test your code using M = 2 and k = 64.

Deliverables:

. A plot of the k-means objective function value versus iteration number.

. A description of how the compressed image looks compared to the original. What regions are best preserved, and which are not?

. A picture of the difference of the two images. You should add a neutral gray (128 , 128, 128) to the difference before generating the image.

. The relative mean absolute error of the compressed image, defined as

where I(˜) and I are the compressed and original images, respectively, viewed as 3-D arrays. This

quantity can be viewed as the average error in pixel intensity relative to the range of pixel intensities.

. Please submit you code, as usual.

(b) (Optional, ungraded) Play around with M and k.