代做Statistical inference代写R编程

2024.08.06 - 首页 >> Database

Statistical inference

1. Explain the difference between a ‘proper score‘ and a ‘strictly proper score‘.

2. Discuss some possible problems with assessing predictions using a ‘non-proper score‘.

3. Explain how one can estimate the integral:

using Monte Carlo importance sampling. Define a suitable sampling distribution and write an expression for the importance sampling estimator ˆA.

Poisson distribution

Consider the Poisson distribution with parameter λ. The function exceed defined below is supposed to take a data.frame. with numerical columns lambda and response as input and then output a data.frame. with the same column and values, plus a new column named probability containing the probability that a value from a Possion(λ) distribution is larger than the given response variable values.

exceed <- function(data) {

data["probability", ] <- ppois(data$response, data$lambda)

}

Below is an example of how the function works in its given form.

print(exceed(data.frame(lambda = 5:10, response = 10:15)))

## Warning in matrix(value, n, p): data length differs from size of matrix: [6 !=

## 1 x 2]

## [1] 0.9863047 0.9799080 0.9730002 0.9658193 0.9585337 0.9512596

and how the function is supposed to work

## lambda response probability

## 1 5 10 0.9863047

## 2 6 11 0.9799080

## 3 7 12 0.9730002

## 4 8 13 0.9658193

## 5 9 14 0.9585337

## 6 10 15 0.9512596

Identify the coding errors in the exceed function. Wirte the correct form. and test it.

Scottish weather data analysis

The Global Historical Climatology Network provides historical weather data collected from all over the globe. Here, we will use a subset of the daily resolution data set containing data from eight weather stations in Scotland, covering the time period from 1 January 1960 to 31 December 2018 (see ghcnd_stations.Rdata and ghcnd_values.Rdata). Some of the measurements are missing, either due to instrument problems or data collection issues.

The ghcnd_stations data frame. has 5 variables:

• ID: The identifier code for each station

• Name: The humanly readable station name

• Latitude: The latitude of the station location, in degrees

• Longitude: The longitude of the station location, in degrees

• Elevation: The station elevation, in metres above sea level

The ghcnd_values data frame. has 7 variables:

• ID: The station identifier code for each observation

• Year: The year the value was measured

• Month: The month the value was measured

• Day: The day of the month the value was measured

• DecYear: “Decimal year”, the measurement date converted to a fractional value, where whole numbers correspond to 1 January, and fractional values correspond to later dates within the year. This is useful for both plotting and modelling.

• Element: One of “TMIN” (minimum temperature), “TMAX” (maximum temperature), or “PRCP” (precipitation), indicating what each value in the Value variable represents

• Value: Daily measured temperature (in degrees Celsius) or precipitation (in mm)

The aim is to estimate a basic weather/climate model for Scotland using all but one weather station, and assess the resulting spatial and temporal predictions on the remaining station. Start by filtering the data so you only have data of type Element == "TMAX". Then use pivot_wider() to create columns with data from each station, with the station ID as column name. If there are any missing values, use drop_na() to remove the corresponding rows of the data frame. To allow separate model estimation and prediction, split the data into two separate data frames; one with data from odd years, and one with data from even years.

Model estimation

Plot temperature maxima and analyze the behavior, e.g., seasonal effect across the years for each stations. Use only half of the daily weather data for the model estimation (the other half should be used for prediction). Build a spatial weather prediction/interpolation model that predicts the weather at Breamar (ID == "UKE00105874"), using the data from the other 7 weather stations using the function lm(). Define and discuss the model. Present and discuss the results of the estimation, including interpretation of the parameter estimates and the effect of additional covariates.

Model assessment

Use predict() to predict the daily weather at Braemar, using the half of the data that was not used for model estimation. Add the prediction and score information as new columns of the data object, so that you can use group_by(), mutate(), filter(), summarise() and ggplot2 method to help with the calculations and presentation of results.

Compute the Absolute Error and Dawid-Sebastiani scores for the predictions, and analyse how the average score behaves for each of the 12 months of the year (January, February, March, etc). Present and discuss the results of the prediction and the scores. Discuss aspects of the model that could be improved.