spss留学生辅导、辅导spss英文

- 首页 >> 其他

Assessed Practical: Brexit

On June 23rd, 2016, The UK had a national referendum to decide whether the country should leave the

EU (‘Brexit’). The result, a win for the Leave campaign, surprised many political commentators, who had

expected that people would vote to Remain. Immediately people began to look for patterns that coud explain

the Leave vote: cities had generally voted to Remain, while small towns had voted to Leave. England and

Wales voted to Leave, while Northern Ireland and especially Scotland voted to Remain.

Figure 6: EU referendum vote by electoral ward. Yellow indicates Remain, blue indicates Leave

In the next few days, the Guardian newspaper presented some apparent demographic trends in the vote, based

on the ages, incomes, education and class of dierent

electoral wards

ng-interactive/2016/jun/23/eu-referendum-live-results-and-analysis). The Guardian’s analysis stopped at

showing these results graphically, and commenting on the apparent patterns. We will go one better by doing

some real statistical analysis of the data.

I have scraped the data from the Guardian’s plots into a data file (brexit.csv) which you can download from

MINERVA

There are 6 attributes in the data. The 5 possible input variables are:

• abc1: proportion of individuals who are in the ABC1 social classes (middle to upper class)

• medianIncome: the median income of all residents

• medianAge: median age of residents

• withHigherEd: proportion of residents with any university-level education

10

• notBornUK: the proportion of residents who were born outside the UK

These are normalised so that the lowest value is zero and the highest value is one.

The output variable is called voteBrexit, and gives a TRUE/FALSE answer to the question ‘did this electoral

ward vote for Brexit?’ (i.e. did more than 50% of people vote to Leave?).

Tasks (week 6):

1. Fit a logistic regression models using all of the available inputs. Identify the direction of each eect

from the fitted coecients.

Compare these with the plots shown on the Guardian website. Do they

agree?

2. Present the value of each coecient

estimate with a 95% confidence interval. Which input would you

say has the strongest eect?

3. Using aic, perform a model selection to determine which factors are useful to predict the result of

the vote. Use a ‘greedy’ input selection procedure, as follows: (i) select the best model with 1 input;

(ii) fixing that input, select the best two-input model (i.e. try all the other 4 inputs with the one you

selected first); (iii) select the best three-input model containing the first two inputs you chose, etc. At

each stage evaluate the quality of fit using aic and stop if this gets worse.

Tasks (week 7):

1. Use the rpart package to create a decision tree classification model. Visualise your model and intepret

the fitted model.

2. Compare your decision tree model and your logistic regression model. Do they attribute high importance

to the same factors? How do you intepret each model to explain the referendum vote?

3. Which model would you use if you were explaining the results for a newspaper article, and why?