辅导STA 141A留学生、讲解R编程设计、R辅导、讲解Data structures
- 首页 >> 其他 STA 141A, Homework 1
Due April 16th 2019 (by 8 am)
1 Data structures in R 1.1 Air quality
1.2 Patient visits
2 Simulation with R
3 Graphs with R
Name:
Student ID:
Section:
Names of your study mates:
Please submit on Canvas, in a compiled R-markdown file (to pdf or html).
All code in this assignment should be cleanly written and well commented, with appropriate use of functions/arguments. Imagine you are sending this code to your colleagues or supervisors for review—which they can only do if they can understand it.
1 Data structures in R
For each of the following cases, describe the best possible data structure (e.g., array, data frame, list, table etc.) for representing the data. Also, write appropriate R codes to answer the questions that follow, treating the data as if it is given. (Hint: it may be actually easier if you simulate some data.
1.1 Air quality
Data: A study of health effects of air quality in 10 major cities of the world involves daily measurements on the four variables: average temperature (temp), total precipitation (precip), maximum PM10 concentration (PM10) and number of deaths among elderly population (death). Measurements are available for five years.
1.What is the average number of deaths for each of the cities on days where the PM10 concentration is greater than 20 ?
2.What is the average PM10 concentration for each of the cities on days with no precipitation and average temperature is above 80 degrees F ?
1.2 Patient visits
Data: The data consist of records of patients’ visits to a clinic. The measurements for each patient are: date of visit (visit), age of patient in years (age), gender with values M or F (gender), weight in lb (weight), systolic blood pressure (BP.sys), diastolic blood pressure (BP.dia), blood glucose level in mg/dl (glucose). For blood pressure levels, the unit is standard and the value is numeric with range between 0 and 600.
1.How many times did each patient visit the clinic ?
2.What is the average systolic blood pressure level for each of the patients with maximum weight (during the study period) greater than 180 lb ?
3.What is the average blood glucose level for each of the patients with age at least 40 years at the first visit ?
2 Simulation with R
Suppose you have
four types of animals: cat, dog, cow, squirrel;
four possible colors: white, black, brown, red;
five possible attributes: big, small, angry, cute, finicky.
Perform the following tasks with R.
1.Generate random samples, with replacement, of size 100 from each of the types. Call the resulting vectors of character strings as: Animal, Color, Attribute.
2.Write an R code to combine the results to produce phrases (character strings) describing the animals, as in this example: big white dog.
3.Create a frequency distribution (or contingency table) of the different types of animals together with colors and attributes based on the sampled data.
4.Use the result in part 3 to obtain the frequency distribution of: (i) Animal vs. Color; (ii) Animal vs. Attribute; (iii) Animal.
3 Graphs with R
Give an informative graphical statistical summary of the following datasets (available with base R). In each case, write very brief (maximum of 100 words) description highlighting the findings. You may use up to 2 plots for illustrating the features of each data set.
1.AirPassengers: Monthly airline passenger numbers during 1949–1960.
2.EuStockMarkets: Daily closing prices of major European stock indices during 1991–1998.
3.trees: Girth, weight and volume for Black Cherry trees.
Due April 16th 2019 (by 8 am)
1 Data structures in R 1.1 Air quality
1.2 Patient visits
2 Simulation with R
3 Graphs with R
Name:
Student ID:
Section:
Names of your study mates:
Please submit on Canvas, in a compiled R-markdown file (to pdf or html).
All code in this assignment should be cleanly written and well commented, with appropriate use of functions/arguments. Imagine you are sending this code to your colleagues or supervisors for review—which they can only do if they can understand it.
1 Data structures in R
For each of the following cases, describe the best possible data structure (e.g., array, data frame, list, table etc.) for representing the data. Also, write appropriate R codes to answer the questions that follow, treating the data as if it is given. (Hint: it may be actually easier if you simulate some data.
1.1 Air quality
Data: A study of health effects of air quality in 10 major cities of the world involves daily measurements on the four variables: average temperature (temp), total precipitation (precip), maximum PM10 concentration (PM10) and number of deaths among elderly population (death). Measurements are available for five years.
1.What is the average number of deaths for each of the cities on days where the PM10 concentration is greater than 20 ?
2.What is the average PM10 concentration for each of the cities on days with no precipitation and average temperature is above 80 degrees F ?
1.2 Patient visits
Data: The data consist of records of patients’ visits to a clinic. The measurements for each patient are: date of visit (visit), age of patient in years (age), gender with values M or F (gender), weight in lb (weight), systolic blood pressure (BP.sys), diastolic blood pressure (BP.dia), blood glucose level in mg/dl (glucose). For blood pressure levels, the unit is standard and the value is numeric with range between 0 and 600.
1.How many times did each patient visit the clinic ?
2.What is the average systolic blood pressure level for each of the patients with maximum weight (during the study period) greater than 180 lb ?
3.What is the average blood glucose level for each of the patients with age at least 40 years at the first visit ?
2 Simulation with R
Suppose you have
four types of animals: cat, dog, cow, squirrel;
four possible colors: white, black, brown, red;
five possible attributes: big, small, angry, cute, finicky.
Perform the following tasks with R.
1.Generate random samples, with replacement, of size 100 from each of the types. Call the resulting vectors of character strings as: Animal, Color, Attribute.
2.Write an R code to combine the results to produce phrases (character strings) describing the animals, as in this example: big white dog.
3.Create a frequency distribution (or contingency table) of the different types of animals together with colors and attributes based on the sampled data.
4.Use the result in part 3 to obtain the frequency distribution of: (i) Animal vs. Color; (ii) Animal vs. Attribute; (iii) Animal.
3 Graphs with R
Give an informative graphical statistical summary of the following datasets (available with base R). In each case, write very brief (maximum of 100 words) description highlighting the findings. You may use up to 2 plots for illustrating the features of each data set.
1.AirPassengers: Monthly airline passenger numbers during 1949–1960.
2.EuStockMarkets: Daily closing prices of major European stock indices during 1991–1998.
3.trees: Girth, weight and volume for Black Cherry trees.