辅导BIST 515、csv数据讲解、辅导C/C++编程、讲解C/C++程序语言留学生
- 首页 >> C/C++编程BIST 515: Introduction to Statistical Software Homework 3
Due date: Wednesday, October 3
1. (27 total points) The National Football League (NFL) players go through a number of evaluations
during the combine so that NFL teams can assess their ability.
The nfl.csv data file is available on the canvas, and it contains information on some of the players who
participated in 2014. The columns in the data file represent the following information:
Player: Name of player being evaluated
College: College that the player attended
Position: The position of the player where DB = defensive back, LB = linebacker, OL = offensive
linemen, RB = running back, S = safety, TE = tight end, WO = wide receiver; players who
played other positions were excluded from the data file
OverallGrade: The overall grade of the player based on the evaluations
Height: Height in inches
ArmLength: Arm length in inches
Weight: Weight in pounds
Dash40: 40-yard dash time in seconds
BenchPress: Number of bench press repetitions of 225 pounds
VerticalJump: Vertical jump in inches
BroadJump: Broad jump in inches
Cone3Drill: 3-cone drill time in seconds
Shuttle20: 20-yard shuttle run in seconds
(a) (3 points) Construct side-by-side box plots of the 40-yard dash times for each position (y-axis is
40-yard dash). Use a yellow fill color for the boxes and appropriate labels (different from the default)
for axes. Note that a formula argument will be needed (rather than x) to construct the plot with the
current form of the data frame.
(b) (3 points) Construct side-by-side dot plots of the 40-yard dash times for each position (y-axis is
40-yard dash). Use red open circles for the plotting symbols and appropriate labels (different from the
default) for axes.
(c) (3 points) Combine the plots in 1(a) and 1(b) into one 1×2 grid. Make sure the y-axes on the two
plots are aligned, all x-axis labels appear, and y-axis labels are appropriately positioned. Include one
overall plot title that is appropriately positioned.
(d) (3 points) Overlay the plots in 1(a) and 1(b) onto one plot.
1
(e) (3 points) Why is the default ordering for the positions given as “DB”, “LB”, . . . ,“WO”in the plots?
How could this ordering be changed and still use the same position labels?
(f) (6 points) Construct a scatter plot of the 40-yard dash times vs. the bench press weight. In your
plot, include the following:
i. Vary the the plotting symbols and their color by the position with the following assignments:
The specific color names are DB = black, LB = red, OL = blue, RB = darkgreen, S = purple, TE =
orange, and WO = gray.
ii. Gridlines
iii. Y and X-axis labels of “40-yard dash (seconds)” and “Bench press repetitions”, respectively.
iv. The name of the player with the largest bench press value next to its corresponding plotting point.
(g) (6 points) Construct a function that produces a scatter plot like in part 1(f) again, but now for
any two numerical variables in the data set. The function should not need any changes to its code for
any plot. Below are further details regarding this function and its produced plot:
i. A call to the function should be of this form:
myplot(xvar, yvar, xlab, ylab)
where myplot is the name of the function, xvar is the variable for the x-axis, yvar is the variable for
the y-axis, xlab is the x-axis label, and ylab is the y-axis label.
ii. Make sure that appropriate title and axis labels are included on the plot (I used the paste() function
to merge the axis labels into a title).
iii. Remove any labeling of points by a corresponding player name.
iv. Run the function for Dash40 vs. BenchPress and Height vs. Weight.
2. (13 total points) Suppose Z ~ N(0, 1). Complete the following using R.
(a) (3 points) Find P(0 < Z < 1.96).
(b) (3 points) Plot the probability density function.
(c) (4 points) Shade in P(0 < Z < 1.96) on the plot from part 2(b). I recommend completing a web
search for the polygon() function to determine how to complete this problem.
(d) (3 points) Plot the cumulative density function.