辅导ENGSCI语言、辅导R课程程序
- 首页 >> Database ENGSCI 211 2022 S1 – Data Analysis Assignment
DUE: Friday 20 May at 11:59pm on Canvas
This assignment requires you to conduct statistical analyses on three data sets.
Preparation and Submission Instructions
Each task should be prepared as a separate document and converted to a HTML or PDF file, which
should be submitted to the appropriate Canvas dropbox prior to the due date. For each task, include
your R code and output, and then your reports.
Clear and succinct communication is an important part of Engineering, regardless of specialisation. We
expect that you will write clear and concise English detailing your understanding of the analysis you conducted.
In Executive Summaries, this means describing analysis in context, not using variable names, using
units when known, rounding sensibly and not using technical language (e.g. p-value).
Most of the marks in each task are allocated to the Methods and Assumption Checks and
Executive Summary. These must be consistent with your R output for credit.
For R code and output, please use a fixed-width font such as Courier New or Consolas.
You may wish to hand-write your Models and Assumption Checks and Executive Summaries. This is
permitted as long as you merge your files such that only one file is submitted task.
There will be penalties for not following instructions!
Late submissions will be penalised per the policy on Canvas.
Rmarkdown / R Notebooks
This is NOT compulsory.
You may use the method demonstrated in class / in recordings to publish your R Notebooks. Note that
Knit PDF only works if you have a LATEX distribution installed; so knit to HTML or knit to Word (and
then converting this to PDF) will generally be the easiest methods.
It is completely acceptable to produce your assignment by copying and pasting R code and output directly
into a word processor of your choice.
Academic Integrity
By submitting this assignment, you confirm that:
• you understand the University’s policies on cheating, plagiarism and group work.
• you declare that your submission is entirely your own work and reflects your own learning.
• you have not allowed access to any part of the assignment to any other person.
We will be monitoring for academic misconduct and will not hesitate to investigate any suspected cases.
Substantial penalties will apply, and will likely result in a delay in the release of your final grade by up to
six months. This alone may negatively impact your internship prospects. If misconduct is confirmed, your
name will be recorded in the University’s Register of Academic Misconduct for 10 years.
In particular, do not send your files to ANYONE, not even to ‘compare answers’. Once a file leaves
your control it may be submitted by your ‘friend’ and leave you liable for misconduct. University procedures
considers both giving and receiving files as academic misconduct and both will be penalised, regardless of
intent. There is no flexibility on this. YOU HAVE BEEN WARNED!
Assistance available
Piazza is the best place to receive assistance from your peers and your lecturer.
Kevin will run office hours. Keep an eye out for Canvas announcements!
However, course staff will NOT answer questions in the 12 hours before the assignment is due.
Therefore, DO NOT LEAVE QUESTIONS TO THE LAST MINUTE.
Page 1 of 2
Tasks
For each task, we expect to see the following, as done in the case studies and discussed in lectures:
• exploratory analysis, including brief comments below the relevant plot(s) and / or summaries
– this is not printed in your coursebook case studies, but is expected in your assignment!
• checking modelling assumptions via appropriate plots
• appropriate inference, including predictions where required
• reports: Methods and Assumption Checks and an Executive Summary
In your submission, you should include all your R code and output, including all plots produced by R.
Task 1: Tyre Wear (9 marks)
A tyre manufacturer has created a new material formulation that reduces tyre wear (and hence allow the
tyres to be used for longer). An experiment was conducted to measure the difference in tyre wear between
the new material formulation and the old one – an old and a new tyre each was installed in the rear of
twenty cars, and the distance until each tyre wore out was recorded. We are interested in finding whether
there is a difference in the wear-out distance, and to quantify that difference if there is one.
The file tyredistance.txt contains the following variables:
Car identifier of car, 1, ..., 20
DistanceNew wear-out distance for new design tyre in a particular car, in thousands of km.
DistanceOld wear-out distance for old design tyre in a particular car, in thousands of km.
Hint: consider carefully whether this is a paired-sample analysis or a two-sample analysis.
Task 2: Pavement Conditions (12 marks)
The quality of a road surface (pavement) deteriorates over time due to wear-and-tear and environmental
conditions. It is of interest to quantify how much a pavement deteriorates in a year, on average, in order to
inform plans on pavement resurfacing. It is also of interest to estimate the pavement condition index for an
individual pavement section that is 15 years old.
The file PavementConds.txt contain the following variables:
Age age of the pavement section, in years
PCI pavement condition index, a composite measure of surface deterioration,
a higher measure means the pavement is in better condition
Task 3: Netflix Movies (16 marks)
A business analyst at Netflix is interested in optimising the assignment of advertising to various TV shows
and movies. In a particular project, the analyst wants to determine if there are any differences in the lengths
of movies with different age ratings as determined by the Motion Picture Association of America (MPAA),
and to quantify any differences detected. The lengths of 20 randomly selected movies with each rating was
collected for this analysis.
The file movies.txt contains the following variables:
length length of the movie, in minutes
rating rating of the movie, either G, PG, PG-13 or R
More information on MPAA ratings (for interest only, no discussion on this is expected):
https://en.wikipedia.org/wiki/Motion_Picture_Association_film_rating_system
Hints:
• Don’t forget to convert the explanatory variable to a factor
• A transformation is probably required. You should check for this.
• Only quantify effects when they are statistically significant.
Page 2 of 2
DUE: Friday 20 May at 11:59pm on Canvas
This assignment requires you to conduct statistical analyses on three data sets.
Preparation and Submission Instructions
Each task should be prepared as a separate document and converted to a HTML or PDF file, which
should be submitted to the appropriate Canvas dropbox prior to the due date. For each task, include
your R code and output, and then your reports.
Clear and succinct communication is an important part of Engineering, regardless of specialisation. We
expect that you will write clear and concise English detailing your understanding of the analysis you conducted.
In Executive Summaries, this means describing analysis in context, not using variable names, using
units when known, rounding sensibly and not using technical language (e.g. p-value).
Most of the marks in each task are allocated to the Methods and Assumption Checks and
Executive Summary. These must be consistent with your R output for credit.
For R code and output, please use a fixed-width font such as Courier New or Consolas.
You may wish to hand-write your Models and Assumption Checks and Executive Summaries. This is
permitted as long as you merge your files such that only one file is submitted task.
There will be penalties for not following instructions!
Late submissions will be penalised per the policy on Canvas.
Rmarkdown / R Notebooks
This is NOT compulsory.
You may use the method demonstrated in class / in recordings to publish your R Notebooks. Note that
Knit PDF only works if you have a LATEX distribution installed; so knit to HTML or knit to Word (and
then converting this to PDF) will generally be the easiest methods.
It is completely acceptable to produce your assignment by copying and pasting R code and output directly
into a word processor of your choice.
Academic Integrity
By submitting this assignment, you confirm that:
• you understand the University’s policies on cheating, plagiarism and group work.
• you declare that your submission is entirely your own work and reflects your own learning.
• you have not allowed access to any part of the assignment to any other person.
We will be monitoring for academic misconduct and will not hesitate to investigate any suspected cases.
Substantial penalties will apply, and will likely result in a delay in the release of your final grade by up to
six months. This alone may negatively impact your internship prospects. If misconduct is confirmed, your
name will be recorded in the University’s Register of Academic Misconduct for 10 years.
In particular, do not send your files to ANYONE, not even to ‘compare answers’. Once a file leaves
your control it may be submitted by your ‘friend’ and leave you liable for misconduct. University procedures
considers both giving and receiving files as academic misconduct and both will be penalised, regardless of
intent. There is no flexibility on this. YOU HAVE BEEN WARNED!
Assistance available
Piazza is the best place to receive assistance from your peers and your lecturer.
Kevin will run office hours. Keep an eye out for Canvas announcements!
However, course staff will NOT answer questions in the 12 hours before the assignment is due.
Therefore, DO NOT LEAVE QUESTIONS TO THE LAST MINUTE.
Page 1 of 2
Tasks
For each task, we expect to see the following, as done in the case studies and discussed in lectures:
• exploratory analysis, including brief comments below the relevant plot(s) and / or summaries
– this is not printed in your coursebook case studies, but is expected in your assignment!
• checking modelling assumptions via appropriate plots
• appropriate inference, including predictions where required
• reports: Methods and Assumption Checks and an Executive Summary
In your submission, you should include all your R code and output, including all plots produced by R.
Task 1: Tyre Wear (9 marks)
A tyre manufacturer has created a new material formulation that reduces tyre wear (and hence allow the
tyres to be used for longer). An experiment was conducted to measure the difference in tyre wear between
the new material formulation and the old one – an old and a new tyre each was installed in the rear of
twenty cars, and the distance until each tyre wore out was recorded. We are interested in finding whether
there is a difference in the wear-out distance, and to quantify that difference if there is one.
The file tyredistance.txt contains the following variables:
Car identifier of car, 1, ..., 20
DistanceNew wear-out distance for new design tyre in a particular car, in thousands of km.
DistanceOld wear-out distance for old design tyre in a particular car, in thousands of km.
Hint: consider carefully whether this is a paired-sample analysis or a two-sample analysis.
Task 2: Pavement Conditions (12 marks)
The quality of a road surface (pavement) deteriorates over time due to wear-and-tear and environmental
conditions. It is of interest to quantify how much a pavement deteriorates in a year, on average, in order to
inform plans on pavement resurfacing. It is also of interest to estimate the pavement condition index for an
individual pavement section that is 15 years old.
The file PavementConds.txt contain the following variables:
Age age of the pavement section, in years
PCI pavement condition index, a composite measure of surface deterioration,
a higher measure means the pavement is in better condition
Task 3: Netflix Movies (16 marks)
A business analyst at Netflix is interested in optimising the assignment of advertising to various TV shows
and movies. In a particular project, the analyst wants to determine if there are any differences in the lengths
of movies with different age ratings as determined by the Motion Picture Association of America (MPAA),
and to quantify any differences detected. The lengths of 20 randomly selected movies with each rating was
collected for this analysis.
The file movies.txt contains the following variables:
length length of the movie, in minutes
rating rating of the movie, either G, PG, PG-13 or R
More information on MPAA ratings (for interest only, no discussion on this is expected):
https://en.wikipedia.org/wiki/Motion_Picture_Association_film_rating_system
Hints:
• Don’t forget to convert the explanatory variable to a factor
• A transformation is probably required. You should check for this.
• Only quantify effects when they are statistically significant.
Page 2 of 2