MATH2392编程辅导、data留学生程序讲解、SQL编程语言辅导

- 首页 >> Java编程
MATH2392 Practice of Analytics
Assignment 3
Due date: Sunday, 4 October, 2020
1. The file juul2.txt (located on Canvas -> Assignment 3) contains data from
Anders Juul on insulin-like growth factors:
• Age – age in years
• Height – height in cm
• Menarche – did the menarche occur? (1/2 = no/yes)
• Sex – 1 = Male, 2 = Female
• IGF1 – Serum IGF-I
• Tanner – Tanner’s puberty classification (1-5)
• Testvol – testis volume
• Weight – weight in kg
Numbers are separated by blanks and missing values are coded by a dot.
a. Find the correlation coefficient between “Weight” and each of the
following variables: “Sex”, “Age”, “Height”, “IGF1”, “Tanner”.
b. Restrict the data to the variables IGF1, Age, Sex, Tanner, Weight and
Height to the subjects that are younger than or exactly 20 years old.
Remove all subjects that have a missing value in any of the variables.
c. Consider body weight as response variable and perform a multiple
regression analysis using the following variables as predictors: “Sex”,
“Age”, “Height”, “IGF1”, “Tanner”. Extract the regression coefficients
with standard error, 95% confidence intervals, and p-values. (Hint: use
OUTEST = option in PROC REG statement with TABLEOUT
requirement).
d. Consider body weight as response variable and do a linear regression
model with age and gender as predictors.
e. Consider body weight as response variable and do a linear regression
model with age and gender and an interaction term for age and gender
as predictors. (Hint: Use PROC GLM procedure).
Copy and Paste the program on your answer sheet for each part.
Submit the answer sheet with your report via Canvas.
All reports should be created in RTF format.
(1 + 2 + 2 + 1 + 1 = 7 marks)
2. There are three datasets Inventory, Orders, and Transaction are located
on Canvas -> Assignment 3. This information has been extracted from a
clothing e-retailer’s database.
Use DATA step to read all three the datasets into SAS, then use PROC
SQL procedure answering the following questions.
a. Combine the three datasets as a single data set.
b. Identify the web_id which has the largest total order quantity.
c. Identify the web_id that has the lowest average cost.
Copy and Paste the program on your answer sheet for each part.
Submit the answer sheet with your report via Canvas.
(4 + 2 + 2 = 8 marks)

站长地图