R统计代码辅导、 讲解R编码、国外英文
- 首页 >> 其他You will work in a group (of approximately 3 students) to produce a single report.
Rules for submitting group reports
Your group will submit, via the Submit your take home assessment Moodle page, one
PDF file containing your report and one R program file
Each group member must click their respective Submit buttons in order for the group’s
submission to be successful and final. By ticking the submission declaration box in Moodle
you are agreeing to the following declaration
Declaration: I am aware of the UCL Statistical Science Department's regulations on plagiarism
for assessed coursework. I have read the guidelines in the student handbook and understand
what constitutes plagiarism. I hereby affirm that the work my group is submitting for this incourse
assessment is entirely our own
The Turn-It-In® plagiarism detection system may be used to scan your submission for
evidence of plagiarism and collusion
You will work together within your group and the usual plagiarism and collusion regulations
do not apply to this form of interaction. However, they do apply to collusion with other groups
or plagiarism of work from other groups or from other sources
Any plagiarism will normally result in zero marks for all students involved, and may also
mean that your overall examination mark is recorded as non-complete. Guidelines as to what
constitutes plagiarism may be found in Departmental Student Handbooks. The relevant
excerpt from the Statistical Science handbook is also posted on Moodle
Late submission will incur a penalty unless there are extenuating circumstances (e.g.
medical) supported by appropriate documentation. Penalties are set out in the latest
editions of the Statistical Science Department student handbooks, available from the
departmental web pages
Failure to submit this in-course assessment may mean that your overall examination mark is
recorded as “non-complete”, i.e. you will not obtain a pass for the course
All members of a group will be awarded the same mark for the assignment
I may ask you as a group to come and discuss your output with me
You will receive, via Moodle, feedback on your work and a provisional grade – grades are
provisional until confirmed by the Statistics Examiners' Meeting in June 2018
Simon Harden, s.harden@ucl.ac.uk, 4/3/2018
Department of Statistical Science
2
Take home assessment: Group Task
As a group, you will describe and analyse monthly wind speed readings from two weather stations
for the period 1950 - 2003. The data are described in detail at the end of this document. You do
not need to investigate their source further – just report on the data as they are presented to you.
Also do not introduce other data, maps etc into your work. Your group will prepare and submit a
single, short, structured report that addresses each of the tasks set out below. All the summary
statistics in the report should come from an R program, about which details are also set out below,
or be readable from plots in the report. Your report and program will be marked by me (Simon
Harden) and you may be required to discuss then with me. You will receive group specific
feedback on your submission
To complete this assignment successfully you need to start work very soon after the
assignment is set and to plan your time carefully. It is quite possible to complete the
assignment by the end of the second term
The data for your group are available in Moodle as a CSV file. The data for each group are
different and will give different results when analysed
Tasks
1. Describe the data using techniques such as summary statistics and plots. Your description
should include both univariate and multivariate analysis
2. Carry out two sample t-tests comparing average readings for:
a. 1950 - 1976 and 1977 – 2003, for the first of your locations
b. Summer (June, July and August) and winter (December, January and February), for
the second of your locations
c. Your two locations over the whole period
For each of the three hypothesis tests you should report the results in a sentence that is
comprehensible to a non-statistician. Note also the specific tasks described in the R
program section
Report
The report should be consistent with the following:
You must use the Microsoft Word template provided on the Moodle page for your report and
are not allowed to change its font, font sizes or margins. [If you wish to be allowed to use
alternative word processing software then you must agree the details with me before
submission]. If the template has been changed, up to 4% of marks can be lost and I will
reformat the document to the template standard, to which the following point will apply
The report must not be longer than 2 pages (2 sides) of A4 paper, including plots, with text in
Arial 11pt font. I will only mark the first 2 pages of any report
The report must be capable of being read on its own: ie it should not refer to the R program
but just contain data / plots from the program’s output
Please save your report as a PDF file from Microsoft Word (FILE, Export, Create PDF/XPS)
It must be written in clear comprehensible English
Any plots should be readable and well labelled
Your report should be anonymous – ie there should be no mention of group members’
names anywhere in your submission
Your report is limited to two sides of A4 paper. This doesn’t mean that you should aim to fill
all the space available to you. Writing more text doesn’t necessarily get you more marks
Department of Statistical Science
3
R program
Your R program should:
Assume that the working directory has already been set to the location of the data file and to
where any plot files will be stored, ie there should be no setwd () command or reference to
directories
Import the data from the CSV file
Generate all the summary statistics and plots that you refer to in your report
Create an output file using the sink () function, containing only the statistics you use in
your report. Your program may investigate other things but the output file should contain all
the information you use in your report and should be created when I run your program using
the source () function in R. The output should be well laid out and contain appropriate
descriptions. The output file itself should not be included in your submission
Create a file for each plot (or set of plots) that you use in your report, and no others
Be well commented with both a description of the program at the start and suitable notes
throughout
Be clearly laid out so that it is easy for me to read
Split the imported data into two data objects – one for each of the two weather stations
Replace all the “-1” readings with “NA”
Be anonymous – ie there should be no mention of group members’ names
Your program may use non-standard packages and you should assume that they are installed on
my computer
Assessment criteria
85 marks will be allocated as follows:
Tasks: 40 marks for completing the tasks as listed above (20 marks for question 1 and 20 for
question 2). So, for instance, a thorough and perceptive description of the data will earn
more marks than just a list of a few summary statistics
Report: 15 marks for complying with the conditions listed above. If, for instance, the
meaning of part of the report is not clear, then marks will be lost
R program: 30 marks for meeting the R program conditions listed above. Thus, the program
should, for instance, output all data and plots that are used in the report when it is run using
the source () function in R
How to approach the assignment
The way that I would approach the assignment would be to:
1. Meet as a group and plan who will do what by when
2. Start an R program which imports the data and then works through all the commands that
are required to generate the data and plots, together with any extra code needed to complete
the programming as set out above. This is probably best done as a group sitting around a
computer together so that everyone can contribute their expertise
3. Draft a report in the format described above which carries out the required tasks. This may
suggest updates to your R program which might be useful
4. Complete the R program, making sure, in particular, that layout and comments have been
considered
5. Finalise the report satisfying yourselves that it is clear and readable
6. Leave a gap of a day or two and then return to the two files that are to be submitted,
ensuring that all the requirements listed above have been met
Department of Statistical Science
4
If at all possible, I would suggest that you complete as much of the project as you can before the
end of the second term
The data
Each group’s data consists of a .csv file with 5 columns:
Year. 1950 to 2003
Month. 1 to 12. The first month is March 1950 and the last, January 2003
Site_No. There are 13 sites numbered as in the following table. The first site began recording
data in 1950, but the final location did not start doing so until 1961
No Location
1 IJmuiden
2 Schiphol
3 De Bilt
4 Soesterberg
5 Leeuwarden
6 Deelen
7 Eelde
8 Vlissingen
9 Hoek van Holland
10 Zestienhoven
11 Gilze Rijen
12 Eindhoven
13 Beek
No_Readings. The number of days in the month for which readings exist. For the remainder of
the days in a month, the readings are invalid for whatever reason
Ave. Wind speed (in m/s). These data have been corrected for exposure changes etc and
reduced to estimates of potential wind at a height of 10m. The original data are hourly readings.
A daily maximum speed was recorded by taking the largest value at 06.00, 12.00, 18.00 or
24.00. The value shown here is the monthly average of those daily maxima. If no valid monthly
average exists, “-1” is shown
Department of Statistical Science