R linear modeling 辅导、辅导留学生R 统计专业
- 首页 >> Algorithm 算法Term Project - Part 1 MATH 3560H - Wesley Burr
Problems
Your term project is intended to be a summary of all of the practical skills with regression and
modeling you’ve gained across the semester. The goal is to pick a data set which is of interest to you,
munge1
the data into R, and then performing a full-fledged data analysis using the linear modeling
framework we’ve learned about.
This initial, Part 1, of the project is due on February 27th, and simply requires that you pick your
data set, and perform your first, flailing, import of the data into R. It’s entirely fine if this import does
not work, or runs into problems: that’s part of the “fun”.
You will hand in a short R Markdown rendered PDF, of no more than 2 pages, which discusses
your data set of choice, its source and provenance, and the reason it is interesting to you. You should
also state one thing you’d like to explore or discover from this data: a hypothesis, if you will. As
mentioned above, you should then try to import the data into R and see if you can manage it. This
mini-report is worth 10% of your final term project grade, or 2% of your final grade.
The rules for the project are as follows:
1. Your data set must be at least 100 observations of at least 3 variables. You do not have to use
them all, but it must be at least that big when you begin.
2. All analyses should be done using R as the framework. You may use other tools as required,
but interfaced through, and analyzed by, R.
3. The final term project report will be delivered as a worked analysis in R Markdown, of no more
than 20 pages length, and a minimum of 5 pages (realistically, you won’t be able to fit it in 5
pages, this is just to keep you sensible).
4. A rubric for evaluation of the final project will be posted this month.
5. All steps taken to clean up and organize your data must be documented and reproducible.
Here are a list of suggested places you can find data to start with:
1. Kaggle Competition Data Sets:
2. Canadian Government Open Data:
3. City of Toronto Open Data:
4. Environment and Climate Change Canada’s National Air Pollution Surveillance network data:
http://maps-cartes.ec.gc.ca/rnspa-naps/data.aspx. Air pollution of all sorts.
5. Environment and Climate Change Canada’s Climate Data (Meteorology):
6. The Bank of International Settlements (BIS): programmatic API access to historical data available
via an R Package,
BIS.html
1Data “munging” is the process of organizing, cleaning and importing the data into a formatted, ready-to-be-analyzed
data set.
1
Term Project - Part 1 MATH 3560H - Wesley Burr
7. Old Textbook Data (not preferred, but ok as a final choice):
8. Data Sets available as part of an R Package (also not preferred, but ok as a final choice):
Do not feel constrained by these, they’re just intended as a starting point and inspiration for you.
I strongly encourage you to find your own data set which you find interesting, and start from there. I
am available to assist you in searching if you have something in mind and need a start point.