辅导GLM留学生、C/C++程序语言调试、辅导C/C++设计、Program讲解 讲解Database|辅导Web开发
- 首页 >> 其他 For all problems below, please use x= 4017
1.(30 points) Nearly 100,000 observations are available on air temperature and specific humidity. From these observations scientists have estimated that the temperature is approximately Normally distributed with mean= 280 deg K with a standard deviation of 6 deg K. The specific humidity has been found to vary approximately as:
h = C/1000* exp(C*T/500) (1)
Where C varies to some extent such that it is also Normally distributed with mean =x, and standard deviation = x/10
a)Generate a sample of size 30 for temperature and humidity given this information.
b)Using your, sample assess whether there is a trend (of any kind) in the temperature data.
c)Estimate a relationship that can predict the probability of h>h75 given T using your sample. Here, h75 is the 75th percentile of the h data from your sample. Present the regression diagnostics, and justify your model choice – linear, nonlinear, GLM, etc.
d)Now consider a relationship between h and T. Given equation 1 above and the description of the probability distributions of T and C, what would be a good form for the model relating h and T? You are welcome to consider transforms or local regression or any other method you would like to apply. DO NOT FIT THIS REGRESSION MODEL. Assuming that the model you have formulated is a linear model between some predictor and some response variable, predict the value of the response variable corresponding to the lowest temperature in your data set by constructing an appropriate weighted average of the response variable.
e)Would your approach and answer to d) change if equation 1) included a random error term on the right hand side? How and why? Do not solve.
(30 points)
2.Twenty groundwater wells are located in a rectangular region. The region is exactly 10 km by 10 km. The wells are located randomly with uniform sampling in the x and the y location coordinates. Water level data has been recorded at each of the wells for 30 years. It can be obtained by executing the code below
S=runif(1)
loc=matrix(runif(40,0,10),ncol=2,nrow=20)
plot(loc)
d=dist(loc,diag=T, upper=T)
c=exp(-d/max(d))
c=as.matrix(c)
diag(c)=rep(1,20)
library(MASS, lib.loc = "C:/Program Files/R/R-3.5.3/library")
data=matrix(ncol=20,nrow=30)
data[1,]=mvrnorm(mu=rep(S,20), Sigma=c)
for ( i in 2:30){for (j in 1:20)data[i,j]=0.95*data[i-1,j]+rnorm(1,0,sqrt(1-0.95^2))}
a.Is there any evidence of common patterns in this data set? What are some methods you could use to explore this? Apply one of those methods; explain why you chose it and report the results.
b.What is your estimate of the water level in year 15 at a location whose coordinates are (5,5)? Clearly explain the procedure you used to develop this estimate, including a brief discussion of competing methods you may have considered; why you chose the one you did; the assumptions of that method, and whether they were satisfied when you applied that method. What is the uncertainty of estimation for this estimate?
c.Now consider the estimation of the water level in year 31 at the same location. Do not attempt to compute this estimate. Sketch out two possible algorithms that you may use to develop this estimate, and comment very briefly on what may be the possible advantage of one over the other?
(30 points)
1.(30 points) Nearly 100,000 observations are available on air temperature and specific humidity. From these observations scientists have estimated that the temperature is approximately Normally distributed with mean= 280 deg K with a standard deviation of 6 deg K. The specific humidity has been found to vary approximately as:
h = C/1000* exp(C*T/500) (1)
Where C varies to some extent such that it is also Normally distributed with mean =x, and standard deviation = x/10
a)Generate a sample of size 30 for temperature and humidity given this information.
b)Using your, sample assess whether there is a trend (of any kind) in the temperature data.
c)Estimate a relationship that can predict the probability of h>h75 given T using your sample. Here, h75 is the 75th percentile of the h data from your sample. Present the regression diagnostics, and justify your model choice – linear, nonlinear, GLM, etc.
d)Now consider a relationship between h and T. Given equation 1 above and the description of the probability distributions of T and C, what would be a good form for the model relating h and T? You are welcome to consider transforms or local regression or any other method you would like to apply. DO NOT FIT THIS REGRESSION MODEL. Assuming that the model you have formulated is a linear model between some predictor and some response variable, predict the value of the response variable corresponding to the lowest temperature in your data set by constructing an appropriate weighted average of the response variable.
e)Would your approach and answer to d) change if equation 1) included a random error term on the right hand side? How and why? Do not solve.
(30 points)
2.Twenty groundwater wells are located in a rectangular region. The region is exactly 10 km by 10 km. The wells are located randomly with uniform sampling in the x and the y location coordinates. Water level data has been recorded at each of the wells for 30 years. It can be obtained by executing the code below
S=runif(1)
loc=matrix(runif(40,0,10),ncol=2,nrow=20)
plot(loc)
d=dist(loc,diag=T, upper=T)
c=exp(-d/max(d))
c=as.matrix(c)
diag(c)=rep(1,20)
library(MASS, lib.loc = "C:/Program Files/R/R-3.5.3/library")
data=matrix(ncol=20,nrow=30)
data[1,]=mvrnorm(mu=rep(S,20), Sigma=c)
for ( i in 2:30){for (j in 1:20)data[i,j]=0.95*data[i-1,j]+rnorm(1,0,sqrt(1-0.95^2))}
a.Is there any evidence of common patterns in this data set? What are some methods you could use to explore this? Apply one of those methods; explain why you chose it and report the results.
b.What is your estimate of the water level in year 15 at a location whose coordinates are (5,5)? Clearly explain the procedure you used to develop this estimate, including a brief discussion of competing methods you may have considered; why you chose the one you did; the assumptions of that method, and whether they were satisfied when you applied that method. What is the uncertainty of estimation for this estimate?
c.Now consider the estimation of the water level in year 31 at the same location. Do not attempt to compute this estimate. Sketch out two possible algorithms that you may use to develop this estimate, and comment very briefly on what may be the possible advantage of one over the other?
(30 points)