代写ECS659P Neural Networks and Deep Learning代写数据结构语言程序

2024.05.10 - 首页 >> Web

May Examination Period 2024

ECS659P Resit

Neural Networks and Deep Learning Duration: 2 hours (+1 for uploads)

Question 1

(a) Consider a regression dataset (x(1) , y(1) ), (x(2) , y(2) ), ... , (x(n) , y(n) ), where each obser- vation x(i) and target y(i) is a real number.

Suppose that the function f given by f(x) = 2 log(x) + x is a perfect predictive model, so that y(i) = f(x(i)) for every i.

Define a function φ : R → R2 that can transform the original regression dataset into a regression dataset (φ(x(1) ), y(1) ), (φ(x(2) ), y(2) ), ... , (φ(x(n) ), y(n) ) that can be used to recover the function fusing linear regression.

In other words, define a function φ such that

f(x) = φ(x) · w,

where · denotes the dot product and w ∈ R2 is a vector of parameters. [13 marks]

(b) Consider a regression dataset with 3 examples and 2 features per observation. Let X ∈ R3×2 denote the observation matrix that contains one row for each observation and one column for each feature, so that

l 0 2 」

X = ' 1 — 1 ' .

[ — 1 1 l

Let y ∈ R3 denote the target vector that contains one row for each target, so that

l 4 」

y = ' — 1 ' .

[ 1 l

Compute the mean squared error of a linear regression model that employs a weight vector

w = l3(1)]

and a bias

b = 1.

[12 marks]

Question 2

(a) Let [ C1 , C2 , ... , Ck ] denote an image (rank 3 tensor) composed of k channels, where each channel Ci is a matrix of a fixed shape.

Let A be an image given by

ll1 2	3 4」 l2 1	3	4」」
' ' 3 5	1 2' ' 1 1	2	5' '
A = ' '	' , '		' ' .
' '3 2	1 0 ' '3 2	1	6 ' '
[[1 2	2 1l [1 3	3	4ll

Compute the output image B of a max-pooling layer that receives A as input and uses a window of size 2 × 2 and a stride 2. [12 marks]

(b) Consider a 3 × 32 ×32 image that goes through a convolutional layer with 64 kernels, each a 3 × 7 × 7 image. What is the shape of the corresponding output if:

1. The convolutional layer uses padding 3 and stride 1.

2. The convolutional layer uses padding 0 and stride 1.

3. The convolutional layer uses padding 3 and stride 2.

Assume the conventional ordering of dimensions (number of channels, height, width). [13 marks]

Question 3

(a) Suppose that a 1 × 128 × 128 (grayscale) image is flattened and given directly to a multilayer perceptron. Suppose that this multilayer perceptron has 64 units in its first layer.

How many weights does this first layer have? How many biases does this first layer have? [10 marks]

(b) Consider a recurrent layer that has two matrices of parameters: A ∈ R32×32 and B ∈ R32×64. Suppose that this recurrent layer does not employ biases and uses a tanh activation function.

Write the equation that this layer would use to compute the current hidden state vector ht ∈ R32×1 based on the previous hidden state vector ht−1 ∈ R32×1, the current observation xt ∈ R64×1, and the parameter matrices A ∈ R32×32 and B ∈ R32×64 .

Hint: Ensure that the matrix-vector multiplications are valid. [15 marks]

Question 4

(a) Consider a linear regression model two weights and no bias.

Suppose that the weight vectors [2, 4]T and [3, 3]T achieve the same mean squared error on the training dataset. If weight decay were employed with λ > 0, which of these weight vectors would be preferred by optimization? [9 marks]

(b) Consider a loss function L : R2 → R given by

L( w1 , w2 ) = w1(2) + w2(2) ,

and note that the corresponding gradient function ΔL : R2 → R2 is given by

ΔL( w1 , w2 ) = [2 w1 , 2 w2]T .

Let w = [2, 4]T be the initial point for gradient descent with the goal of minimizing L. What are the next two points?

Assume a learning rate η = 0.25. [16 marks]