代做MATH377: Financial and Actuarial Modelling in R Tutorial 5代做Statistics统计
- 首页 >> OS编程MATH377: Financial and Actuarial Modelling in R
Tutorial 5
Exercise 1. Let X = (X1, X2) be a bivariate normal distributed random vector with mean vector µ = (1.5, 1) and covariance matrix
a) Evaluate the density function of X at x = (1, 1) and x = (0, 2).
b) Compute P(−2 ≤ X1 ≤ 4, X2 > 1).
c) Plot the 3D surface of this bivariate normal density and its contours. Hint: You can modify the code in the lecture notes to plot the loglikelihood of a normal distribution. However, to use outer() you may need to pass a function similar to this one: f <- function(x, y) dmvnorm(cbind(x, y), mu , sigma). Finally, use the functions contour() and persp() to create the plots.
d) Generate 5000 observation from X and create a scatter plot for the generated sample.
e) Compute the empirical mean vector, covariance matrix, and correlation matrix for the generated sample in d).
f) Using your simulated sample in d), approximate the 95% quantile of X1 · X2.
Exercise 2. Consider the cars data set in R.
a) Compute the correlation between speed and dist, and create a scatter plot to compare speed vs dist. Do you see any relationship?
b) Fit a linear regression model to explain distance in terms of speed.
c) Add the regression line to your plot in a). Hint: this can be done using the abline() function applied to your regression model in b).
d) Predict dist for values of speed of 28 and 30.
e) Does the model seem to satisfy the assumptions of mean zero, constant variance, and normality for the residuals?
Exercise 3. Consider the Boston data set available in the MASS package.
a) Create a scatter plot of lstat vs medv. Do you see any relationship?
b) Fit a linear regression model to explain medv in terms of lstat.
c) Add the regression line to your plot in a).
d) In a linear model, we can specify that the relationship between the independent variable and dependent variable is given in the form. of an nth-degree polynomial. One way to specify this in R is by using I(). Fit a linear model with medv ~ lstat + I(lstatˆ2), then predict medv for values of lstat of 0 and 40, and add these values as a line in your plot in a).
e) Use an information criteria to conclude which model among the ones in b) and d) describes the data better.
f) An alternative way to produce the same model as in d) is by using medv ~ poly(lstat, 2, raw = TRUE). In the previous line of code, 2 can be changed to other integer values to specify polynomials of different degrees. Fit a linear model with a 5th-degree polynomial for lstat, then predict medv for values of lstat of 0 and 40, and add a line in your plot in a) using these values.
g) Use an information criteria to conclude which of the three models best describes the data.
h) Fit a linear model with a 8th-degree polynomial. Conclude based on the information criteria if this model is a better choice (recall the concept of overfitting).