辅导MAS6006留学生、Statistics讲解、辅导R设计、讲解R编程语言

2019.02.18 - 首页 >> 其他

School of Mathematics and Statistics

University of Sheffield

MAS6006 Statistical Consultancy, 2018/19

Project 1: Emulators in Computer Modelling

Background

You are a statistician working at an engineering consultancy firm. Your company regularly

uses computer modelling in its work, but often has problems with the computational

expense involved in running the models. A single run of a computer model at one

choice of input values can take hours of CPU time. This causes problems whenever it is

necessary to run a model at a large number of different input values, for example, when

searching for an optimal input value to optimise the model output, or in assessing the

sensitivity of the model prediction to uncertainty in the choice of input values. Simply

investing in more computing resources isn’t thought to be the solution: more computing

power will instead be used to improve the accuracy of the models.

The models are deterministic, and so if run twice at the same input value, they

will produce the same output value. Currently, an approach used within the company

is to run a model at as many inputs as it can, and then use multi-dimensional linear

interpolation to predict (instantly) the model output at any desired new input value.

This method requires evaluations of the model over a regular grid of input values. For

example, if a model has three inputs, each scaled to take values between 0 and 1, one

could choose four evenly spaced values 0, 13,23, 1, then run the model 4 × 4× 4 times, at

each possible combination of these values for the three inputs. This can still be costly

in terms of how many times the model must be run: if the model has d inputs, and the

model is to be run over a regular grid with n values per input, this requires

total

model runs.

Your line manager has come across another technique that she thinks may work

better: “Gaussian process emulation”, and she wants you to investigate. She has found

the method described in the following paper:

O’Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial.

Reliability Engineering and System Safety, 91, 1290-1300.

She has provided you with a data set to evaluate this method, which includes some

predictions obtained with the linear interpolation approach, and she has asked for a

short report that describes your findings.

The data

A data set from a computer model has been provided. The computer model takes a

vector of 8 inputs, each continuous over the interval [0,1], and returns a scalar output.

The model is deterministic: if run at the same input value twice, it will return the same

output value; there is no noise in the data. There are two files.

1. training.csv contains 100 runs for fitting the emulator. Each row is one run of

the model. The first 8 columns give the value of the model input, and the 9th

column gives the model output.

2. test.csv contains 100 runs for testing the emulator. Each row is one run of the

model. The first 8 columns give the value of the model input, and the 9th column

gives the model output. The 10th column gives an estimate of the model output,

obtained using the linear interpolation method. The interpolation method used a

regular grid of 38 = 6561 model runs (so a different training data set to that in

training.csv) .

Using the emulator method

You should start with the article your line manager has suggested, but you can make

use of any other literature you wish. You will need to find an R package to implement

the method. There are various different names used, so you may wish to try several

searches: “Gaussian process emulator”, “Gaussian process meta-model”, “Gaussian

process regression”. Use one package only: comparing different packages is outside

the scope of the project. (Gaussian process methods have been implemented in other

languages, but you are required to use R for this project).

The report

The maximum report length is 6 pages, excluding references. You do not need to

include any R code in your report.

Your report must be structured according to the guidelines in Chapter 5 of the

module handbook (you should also follow carefully the advice and instructions in

Chapters 4, 6, 7, and 8).

Do not provide a detailed, technical account of Gaussian process emulators in your

report. You should, however, include an overview of how the method works. Your

target readers are not statisticians, but do have degree-level mathematics.

Your line manager is not interested in an analysis of the computer model itself:

do not write about the relationship between the model inputs and output. She

is interested in the performance of the emulator compared with the linear interpolation

method, and wants to know about the advantages and disadvantages of

emulators: in what circumstances the company might use them in other projects.

For reference, the multivariate linear interpolation method was implemented in

MATLAB, using the function interpn(). You do not need to write about this in

your report.

Submit your report on MOLE in the usual way. Separately, you should email your

R code (script, knitr or RMarkdown file) to j.oakley@sheffield.ac.uk.

Asking for help

In addition to your line manager, you also have a mentor within the company, who is a

more senior statistician. Your mentor has not used Gaussian process emulators before,

but may be able to advise you if you need help with the technical aspects of the project.

Of course, like all your other colleagues, your mentor is busy, and may not have the time

if you ask too much of him!

All questions should be posted on MOLE, on the Project 1 discussion board. Please

do not ask for help by email. This discussion board will be moderated, so your

message will not appear on the board until we have approved it (so you can ask anything

you like). State in any message who your question is addressed to:

your line manager, for questions about the project brief;

your mentor, for technical questions;

the module leader (Keith), for administrative questions.

Otherwise, the project must be entirely your own work. Do not ask anyone else for

help, or show your work to anyone else.