ISYE 4140辅导、R编程调试、STATISTICAL ANALYSIS讲解、R语言辅导 讲解留学生Processing|讲解R语言编程
- 首页 >> 其他 ISYE 4140 – STATISTICAL ANALYSIS
Summer 19 – Final Exam
By my signature below, I attest that I completed this exam in accordance with the rules. I did not have
the assistance of another person (student or other), use a solution aid prepared by another such as a
spreadsheet template, copy the solution from an answer key or from homework completed by another,
or offer assistance to others in the class. __________________________ (signature)Question 1. (20 pts)
a) Use bootstrap to calculate a 94% CI of the mean of a Lognormal random variable with a
mean of 3.2 and a variance of 4. Use a sample of size 11 and 5,000 replications. Use a
seed equal to the last 3 digits of your RIN. (8)
b) Carry out a simulation experiment using R with 10,000 replications to study the
following scenario. A unit is made by attaching 3 parts as follows:
Base ~ N (20, .22
), a Right ~ N (8, .32
), and a Left ~ N (4, .42
)
Now the gap has to be between (7.8 and 8.2) to be acceptable.
i- Using a seed equal to the middle 3 digits of your RIN, estimate the percentage of
these connections that will be acceptable. (6)
ii- Now using probability theory calculate the true percentage. (6)Question 2: (10 points)
Read in the R data “mtcars”, which has data of many car brands fuel consumption in mpg along
with 10 other design aspects.
a) Construct a 95% CI on the ratio of the variance of the 6-cylinder to the 8-cylinder cars.
b) Carry out a test of hypothesis at .05 l.o.s. to check whether the mean mpg for these two
engine types is the sameQuestion 3. (45 points)
a) The pull strength of a wire bond is an important characteristic. The data in file
strength.txt give information on pull strength y, die height x1, post height x2, loop height
x3, wire length x4, bond width on the die x5, and bond width on the post x6. Fit a linear
regression model between the response and the six independent variables and comment
on the results (20)
i. List and discuss the model assumptions (2)
ii. Calculate the correlation coefficients and comment on multi-collinearity (3)
iii. Use partial sum squares to find the contribution of each variable and test its
significance at .05 level of significance (4)
iv. Using step function determine the value of the best model’s AIC? (3)
v. Check the equality of variance assumption and calculate the variance inflation
factor for each dependent variable. What are your conclusions? (2)
vi. Design the best model to use showing the corresponding standard deviation, R- square, and R-square adjusted. (3)
vii. Construct a 99% confidence interval for the slopes of all the significant
independent variables (3)b) Read in the file “website.csv” that contains data about different website. The file
contains 551 rows and 11 columns. You are required to: (25)
i) Use Test Sets (by dividing the data into two groups) train and test using a seed of
1776, and assigning the train data to the random numbers generated between .25 and
.75. Build and check the validity of a model that uses “entertain”, “inspire”, and
“trust” to estimate the “sum” by reporting the MSE and the MSPR. (10)
ii) Use k-fold cross validation and a seed of 1991, to check the model validity in using
“timeout” and “social” to estimate “sum”. Report the MSE and MSPR. (15)Question 4. (25 points)
a) A study on the amount of dye needed to get the best color for a certain type of fabric was
reported. The three amounts of dye, 1/3 %, 1%, and 3% (weight of fabric) were each
administered at two different plants.
The color density of the fabric was then observed four times for each level of dye at each
plant.
The data is found in file fabric.txt. (10)
i. Perform an analysis of variance to test the hypothesis, at the 0.05 level of
significance, that there is no difference in the color density of the fabric for the
three levels of dye and select the appropriate test, and state your conclusion.
Report a p-value (5)
ii. Perform a Tukey test at .05 l.o.s. and discuss your findings (5)b) Corrosion fatigue in metals has been defined as the simultaneous action of cyclic stress
and chemical attack on a metal structure. A widely used technique for minimizing
corrosion fatigue damage in aluminum involves the application of a protective coating. A
study conducted by the Department of Mechanical Engineering at Virginia Tech used 3
different levels of humidity
Low: 20–25% relative humidity
Medium: 55–60% relative humidity
High: 86–91% relative humidity, and
3 types of surface coatings
Uncoated: no coating
Anodized: sulfuric acid anodic oxide coating
Conversion: chromate chemical conversion coating
The corrosion fatigue data, expressed in thousands of cycles to failure is stored in file
fatigue.txt (15)
i) Perform an analysis of variance with α = 0.05 to test for significant main and
interaction effects. (5)
ii) Use Tukey’s test at the 0.05 level of significance to determine which humidity
levels result in different corrosion fatigue damage (5)
iii) Use an interaction plot and comment on your findings (5)
Summer 19 – Final Exam
By my signature below, I attest that I completed this exam in accordance with the rules. I did not have
the assistance of another person (student or other), use a solution aid prepared by another such as a
spreadsheet template, copy the solution from an answer key or from homework completed by another,
or offer assistance to others in the class. __________________________ (signature)Question 1. (20 pts)
a) Use bootstrap to calculate a 94% CI of the mean of a Lognormal random variable with a
mean of 3.2 and a variance of 4. Use a sample of size 11 and 5,000 replications. Use a
seed equal to the last 3 digits of your RIN. (8)
b) Carry out a simulation experiment using R with 10,000 replications to study the
following scenario. A unit is made by attaching 3 parts as follows:
Base ~ N (20, .22
), a Right ~ N (8, .32
), and a Left ~ N (4, .42
)
Now the gap has to be between (7.8 and 8.2) to be acceptable.
i- Using a seed equal to the middle 3 digits of your RIN, estimate the percentage of
these connections that will be acceptable. (6)
ii- Now using probability theory calculate the true percentage. (6)Question 2: (10 points)
Read in the R data “mtcars”, which has data of many car brands fuel consumption in mpg along
with 10 other design aspects.
a) Construct a 95% CI on the ratio of the variance of the 6-cylinder to the 8-cylinder cars.
b) Carry out a test of hypothesis at .05 l.o.s. to check whether the mean mpg for these two
engine types is the sameQuestion 3. (45 points)
a) The pull strength of a wire bond is an important characteristic. The data in file
strength.txt give information on pull strength y, die height x1, post height x2, loop height
x3, wire length x4, bond width on the die x5, and bond width on the post x6. Fit a linear
regression model between the response and the six independent variables and comment
on the results (20)
i. List and discuss the model assumptions (2)
ii. Calculate the correlation coefficients and comment on multi-collinearity (3)
iii. Use partial sum squares to find the contribution of each variable and test its
significance at .05 level of significance (4)
iv. Using step function determine the value of the best model’s AIC? (3)
v. Check the equality of variance assumption and calculate the variance inflation
factor for each dependent variable. What are your conclusions? (2)
vi. Design the best model to use showing the corresponding standard deviation, R- square, and R-square adjusted. (3)
vii. Construct a 99% confidence interval for the slopes of all the significant
independent variables (3)b) Read in the file “website.csv” that contains data about different website. The file
contains 551 rows and 11 columns. You are required to: (25)
i) Use Test Sets (by dividing the data into two groups) train and test using a seed of
1776, and assigning the train data to the random numbers generated between .25 and
.75. Build and check the validity of a model that uses “entertain”, “inspire”, and
“trust” to estimate the “sum” by reporting the MSE and the MSPR. (10)
ii) Use k-fold cross validation and a seed of 1991, to check the model validity in using
“timeout” and “social” to estimate “sum”. Report the MSE and MSPR. (15)Question 4. (25 points)
a) A study on the amount of dye needed to get the best color for a certain type of fabric was
reported. The three amounts of dye, 1/3 %, 1%, and 3% (weight of fabric) were each
administered at two different plants.
The color density of the fabric was then observed four times for each level of dye at each
plant.
The data is found in file fabric.txt. (10)
i. Perform an analysis of variance to test the hypothesis, at the 0.05 level of
significance, that there is no difference in the color density of the fabric for the
three levels of dye and select the appropriate test, and state your conclusion.
Report a p-value (5)
ii. Perform a Tukey test at .05 l.o.s. and discuss your findings (5)b) Corrosion fatigue in metals has been defined as the simultaneous action of cyclic stress
and chemical attack on a metal structure. A widely used technique for minimizing
corrosion fatigue damage in aluminum involves the application of a protective coating. A
study conducted by the Department of Mechanical Engineering at Virginia Tech used 3
different levels of humidity
Low: 20–25% relative humidity
Medium: 55–60% relative humidity
High: 86–91% relative humidity, and
3 types of surface coatings
Uncoated: no coating
Anodized: sulfuric acid anodic oxide coating
Conversion: chromate chemical conversion coating
The corrosion fatigue data, expressed in thousands of cycles to failure is stored in file
fatigue.txt (15)
i) Perform an analysis of variance with α = 0.05 to test for significant main and
interaction effects. (5)
ii) Use Tukey’s test at the 0.05 level of significance to determine which humidity
levels result in different corrosion fatigue damage (5)
iii) Use an interaction plot and comment on your findings (5)