辅导R外国、辅导R编程CANVAS

- 首页 >> Algorithm 算法

Hand in electronically via CANVAS

First a bit about handing in your assignment. You need to submit both your R Markdown

document and a pdf file containing the document it generates. To create a pdf you should start

your R Markdown document with the following lines (having made the appropriate changes):

---

title: "STATS 762 Assignment 1"

author: "Your Name, ID 1234567"

date: "Due: 27 March 2017"

output: pdf_document

---

If you are using Windows, you may find that you cannot generate a pdf file directly. In this

case replace output: pdf_document with output: word_document. When you click the **Knit**

button a Word document will be produced which you can then open and save as a pdf file. Submit

the pdf file and not the Word file.

The data for this assignment comes from the UCI Machine Learning Repository:

The original source for this data is: M. Elter, R. Schulz-Wendtland and T. Wittenberg (2007)

“The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize

an intelligible decision process.” Medical Physics 34(11), pp. 4164-4172.

In addition to the data file, an information file has been posted on CANVAS which contains the

background for this dataset that was given on the Machine Learning Repository webpage. Read

this file carefully as it contains background information that will help you understand the context

of this data.

1

1. Create a data frame named birad.df in R. Make sure that the variables have the proper

designations (numeric, factor . . . ). Also make sure that there are no obvious mistakes in the

data. In R missing values are designated by NA, so you may need to modify your data frame

to conform to this protocol.

2. The BI-RADS (Breast Imaging Reporting and Data System) assessment score evaluates the

severity of a lesion based on its observed characteristics during a mammogram.

(a) Use a mosaic plot to explore the relationship between the BI-RADS assessment and the

probability that a lesion is malignant as opposed to benign. Comment on what your plot

indicates about this relationship.

(b) Fit a logistic regression model that relates the BI-RADS assessment to the probability

that a lesion is malignant. Check for over-dispersion and comment on what you find.

Use this model to get a 95% confidence interval for the probability of malignancy for

each level of BI-RADS assessment.

3. A patients age is also believed to be important in predicting whether a lesion is malignant or

not.

(a) Fit the logistic regression model that uses both the BI-RADS assessment and age as

regressors. Does including age in the model improve its ability to predict the probability

that a lesion is malignant? Support your answer.

(b) Medical diagnostic test are often assessed by their sensitivity (the probability the test is

positive when the condition exists) and specificity (the probability the test is negative

when the condition does not exist). For the two logistic regression models that have been

fitted assume that the diagnosis of a malignant lesion is positive when the estimated

probability is ≥ .5. Estimate the sensitivity and specificity for each of the two logistic

regression models and comment on the results.

4. Often it is useful to create a categorical variable from a numeric variable such as age. For this

data set create a categorical variable that divides Age into the following categories: under 30,

30-39, 40-49, . . . .

(a) Fit the logistic regression model that use both the BI-RADS assessment and the age

group as regressors. Does this new model fit better than the model from part 3?

(b) Estimate the sensitivity and specificity for this model and compare it to your finding in

3(b).

5. The BI-RADS assessment is based on a number of characteristics including the shape, margin

and density of the lesion. Does the BI-RADS assessment capture all of the useful information

(with respect to predicting the probability a lesion is malignant) from these three characteristics.

Provide evidence to support your answer.


站长地图