R辅导:SDS192 Data Wrangling辅导R编程、R编程解析

- 首页 >> 其他

使用R语言,对数据集进行数据整理。

Learning Goals

  • To further sharpen data wrangling and data visualization skills
  • To learn how to collaborate using modern work ows

Technical Skills

  • dplyr
  • tidyr
  • GitHub

Readings

  • Modern Data Science with R, Ch. 2-5

Mini-Project

You may work with a partner or two to analyze Federal Election Commission data contained in the sds192-mp2 repository (https://github.com/beanumber/sds192-mp2), and report your ndings in a short writeup. The topic is up to you. The best projects will: discuss an interesting and well-motivated topic involve some non-trivial data wrangling (e.g.not just a bunch of mutate() s) provide a well thought-out, informative analysis convey some sort of insight be well-written

GitHub

You will use GitHub to collaborate with your partner(s) on this assigment.

  1. Nominate one person to host your repository. Suppose this person’s GitHub username is superfun.
  2. Have superfun fork the sds192-mp2 repository from beanumber
  3. Have superfun add other teammates as “Collaborators” (under “Settings”)
  4. All teammates:
    • Open new project in RStudio
    • From Version Control
    • From GitHub
    • Paste the “Clone or download” URL from superfun/sds192-mp2 ‘s repo

You may have to install git (http://happygitwithr.com/install-git.html) on your computer!

Data

These data come from the Federal Election Commission (http://www.fec.gov/ nance/disclosure/ftpdet.shtml#archive_link), and are based on the 2011-2012 federal election cycle. These data were collected using the fec (https://github.com/beanumber/fec) package for R.

There are four tables present. To load them, switch to the sds192-mp2 project in RStudio, pull, and run the following:

1
2
3
4
load("house_elections.rda")
load("candidates.rda")
load("committees.rda")
load("contributions.rda")

 

Be sure to read the supporting documentation for these data. It is your responsibility to know what you are looking at!

Grading Rubric

There are 13 possible points for this mini-project.

Baseline

  • +1 for an .Rmd that compiles without errors
  • +1 for including the code that wrangled the data
  • +1 for using at least two of the ve basic verbs (i.e., select() , mutate() , etc.)
  • +1 for annotating your data wrangling pipeline (this can be in a few sentences in text surrounding the R code chunks, or some informative comments inside the R code chunks)
  • +1 unnecessary messages from R are hidden from being displayed in the HTML

Average

  • +1 for explaining in a single coherent sentence what we can learn from these data
  • +1 for using at least one join (e.g. left_join() , inner_join() , etc.)
  • +1 blog post text provides context or background useful in interpreting the graphic
  • +1 for using GitHub for version control

Advanced

  • +1 for writing a function to generalize your analysis rather than re-writing the same code multiple times
  • +1 for using the Issues tab on GitHub to plan your project
  • +0-2 WOW factor: awarded at the professors’ discretion for submissions that are exceptionally compelling