R辅导:SDS192 Data Wrangling辅导R编程、R编程解析
- 首页 >> 其他使用R语言,对数据集进行数据整理。
Learning Goals
- To further sharpen data wrangling and data visualization skills
- To learn how to collaborate using modern work ows
Technical Skills
- dplyr
- tidyr
- GitHub
Readings
- Modern Data Science with R, Ch. 2-5
Mini-Project
You may work with a partner or two to analyze Federal Election Commission data contained in the sds192-mp2 repository (https://github.com/beanumber/sds192-mp2), and report your ndings in a short writeup. The topic is up to you. The best projects will: discuss an interesting and well-motivated topic involve some non-trivial data wrangling (e.g.not just a bunch of mutate() s) provide a well thought-out, informative analysis convey some sort of insight be well-written
GitHub
You will use GitHub to collaborate with your partner(s) on this assigment.
- Nominate one person to host your repository. Suppose this person’s GitHub username is superfun.
- Have superfun fork the sds192-mp2 repository from beanumber
- Have superfun add other teammates as “Collaborators” (under “Settings”)
- All teammates:
- Open new project in RStudio
- From Version Control
- From GitHub
- Paste the “Clone or download” URL from superfun/sds192-mp2 ‘s repo
You may have to install git (http://happygitwithr.com/install-git.html) on your computer!
Data
These data come from the Federal Election Commission (http://www.fec.gov/ nance/disclosure/ftpdet.shtml#archive_link), and are based on the 2011-2012 federal election cycle. These data were collected using the fec (https://github.com/beanumber/fec) package for R.
There are four tables present. To load them, switch to the sds192-mp2 project in RStudio, pull, and run the following:
1
2
3
4
load("house_elections.rda")
load("candidates.rda")
load("committees.rda")
load("contributions.rda")
Be sure to read the supporting documentation for these data. It is your responsibility to know what you are looking at!
Grading Rubric
There are 13 possible points for this mini-project.
Baseline
- +1 for an .Rmd that compiles without errors
- +1 for including the code that wrangled the data
- +1 for using at least two of the ve basic verbs (i.e., select() , mutate() , etc.)
- +1 for annotating your data wrangling pipeline (this can be in a few sentences in text surrounding the R code chunks, or some informative comments inside the R code chunks)
- +1 unnecessary messages from R are hidden from being displayed in the HTML
Average
- +1 for explaining in a single coherent sentence what we can learn from these data
- +1 for using at least one join (e.g. left_join() , inner_join() , etc.)
- +1 blog post text provides context or background useful in interpreting the graphic
- +1 for using GitHub for version control
Advanced
- +1 for writing a function to generalize your analysis rather than re-writing the same code multiple times
- +1 for using the Issues tab on GitHub to plan your project
- +0-2 WOW factor: awarded at the professors’ discretion for submissions that are exceptionally compelling