辅导HTML留学生、辅导program formats、讲解R编程语言、讲解R

2019.01.26 - 首页 >> 其他

Homework 2

your name

Note: some of the formatted text below will not print properly if you select HTML (or Word)

as your output, but it works fine for PDF.

Commands From Your Homework 1

1. Every one of our “code chunks”, as R Markdown calls them, will begin with three

left-apostrophes, followed by the terms in braces (also called squiggly brackets).

The code chunk ends when three left-apostrophes appear alone on a separate line. Every line

in between these two types of command is interpreted as an R command; if you were working

in “plain R”, these lines would be all you would type, but you would not produce a PDF for

your output automatically, and you would not be able to type additional lines like these.

Finally, each code chunk has this format: “{r name, include=TRUE}” Whatever name you

pick for the code chunk has to change from chunk to chunk, within each file.

The lines below are from Homework 1. I have added comments to indicate what each line

accomplishes. Feel free to include your own comment lines in anything you add to homework

files.

# the line below calls a library that extends Rs

capabilities

# by allowing it to read data from other program formats, such as

# Statas

.dta format

library(foreign)

# create an object that resembles a spreadsheet, by reading the data

# from EAWE21.dta, using the read.dta command

mydata = read.dta("/home/are106/EAWE21.dta") #

# "attach" links variable names in the object "mydata" to our program, so that

# R can refer to the variable names directly

attach(mydata)

# this is our first model from Chapter 1, using linear regression to predict

# earnings using years of schooling

lm(EARNINGS~S)

## Call:

## lm(formula = EARNINGS ~ S)

## Coefficients:

## (Intercept) S

## 0.7647 1.2657

# this command tests the hypothesis that the mean value of earnings is 15

# (dollars [per hour)

t.test(EARNINGS,mu=15)

## One Sample t-test

## data: EARNINGS

## t = 8.6279, df = 499, p-value < 2.2e-16

## alternative hypothesis: true mean is not equal to 15

## 95 percent confidence interval:

## 18.53764 20.62388

## sample estimates:

## mean of x

## 19.58076

For most of this homework, we are going to try to learn more about EARNINGS.

First, verify by typing out the terms in the appropriate formulas for the t-statistic and the

confidence limits that you and R agree about how these should be calculated. For instance,

your R code should be structured the way the Excel commands for confidence limits in

Lecture 2 were structured. You should be able to obtain the same three numbers that R

printed above.

2. Next, display a histogram for EARNINGS and then for the natural log of EARNINGS;

you can simply replace “EARNINGS” with “log(EARNINGS)” to do this.

Which of the two more closely resembles the normal distribution? How would you interpret

this finding?

2. Next, we would like to look for dierences

in EARNINGS by various groups.

The data set includes 250 men (MALE==1 or FEMALE==0) and 250 women (MALE==0

or FEMALE==1).

Test the hypothesis that the population mean earnings are the same for men and women,

using this sample. Use the assumption that the variance of EARNINGS is the same for men

and women.

Interpret the result of your test.

Does it seem reasonable to assume that the two variances are equal?

3. It is common to observe that married people earn more than those who are not married;

there are competing explanations as to why.

Test the hypothesis that the mean value of earnings is not aected

by marital status. Again,

make the assumption that the two groups’ variances are the same.

Interpret your results. Does the variance assumption seem valid?

4. Now we return to the red and green dice from Homework 1. Use the lines below to

simulate the random variables from Dougherty’s Chapter 1 problems, and then use R

to estimate the answers to each of the problems below. ## Generating the Random

Variables for Pages 13 and 14

red = sample(c(1:6),repl=T,size=10000)

green = sample(c(1:6),repl=T,size=10000)

sum = red+green

max = ifelse(red > green, red, green)

diff = abs(red-green)

table(sum)

## sum

## 2 3 4 5 6 7 8 9 10 11 12

## 284 586 855 1110 1389 1655 1346 1047 885 544 299

mean(sum)

## [1] 6.9828

mean(sum^2)

## [1] 54.727

var(sum)

## [1] 5.968101

sd(sum)

## [1] 2.44297

sum( (sum-mean(sum))^2 )/9999

## [1] 5.968101

mean(sum^2)-mean(sum)^2

## [1] 5.967504

Why doesn’t the last expression equal the other variances? (Because mean divides by n and

not n ≠ 1.)

hist(sum)

Histogram of sum

sum

Frequency

2 4 6 8 10 12

0 500 1000 1500

Preview of Homework 1

R.1 The probability distribution for diff

R.2 The probability distribution for max

R.3 Expected value of diff

R.4 Expected value of max

R.6 Expected diff squared

R.7 Expected max squared

R.9 Variance of diff

R.10 Variance of max

Histogram for max

Histogram for diff

5. The code chunk below repeats the simulations from Lecture 2.

Use this simulation to answer the questions below.

Sampling from the normal distribution

Excel Problem

# set.seed(431.01092019)

# create space to save a large number of N(0,1) random variables

mu = 10

mu0 = 10

sigma = 6

n = 25

nreps=10000

ys = matrix(-9999,n,nreps)

zs = matrix(-9999,nreps,1)

# create space to save various summary statistics

means = matrix(-9999,nreps,1)

vars = matrix(-9999,nreps,1)

lowers = matrix(-9999,nreps,1)

uppers = matrix(-9999,nreps,1)

# a loop does our main work for this simulation

# indenting is not necessary

for (i in 1:nreps) {

ys[,i] = mu+sigma*rnorm(n)

means[i] = mean(ys[,i])

vars[i] = var(ys[,i])

lowers[i] = mean(ys[,i])-1.96*(sigma/sqrt(n))

uppers[i] = mean(ys[,i])+1.96*(sigma/sqrt(n))

zs[i] = (mean(ys[,i])-mu0)/(sigma/sqrt(n)) # test stat uses 10

}

mean(means) # find the average of 10,000 sample means

## [1] 10.02221

Interpret this result.

var(means) # find the variance of these 10,000 means

## [,1]

## [1,] 1.436783

Interpret this result.

mean(vars) # find the average of 10,000 sample variances

## [1] 36.28198

Interpret this result.

var(vars) # find the variance of these 10,000 variance estimates

## [,1]

## [1,] 112.7709

Interpret this result.

# does the confidence interval contain the hypothesized value for mu?

check = lowers<mu0 & uppers >mu0

mean(check)

## [1] 0.9507

Interpret this result.

Let’s illustrate further by examining the contents of {lower} and {upper} for the first

20 observations.

lowers[1:20]

## [1] 9.418368 6.727158 9.134867 7.043106 8.444637 7.281416 6.796267

## [8] 6.871959 8.269464 8.086432 10.121577 6.391702 8.384637 7.111669

## [15] 8.384823 7.497853 6.908756 8.171247 9.439866 7.371906

uppers[1:20]

## [1] 14.12237 11.43116 13.83887 11.74711 13.14864 11.98542 11.50027

## [8] 11.57596 12.97346 12.79043 14.82558 11.09570 13.08864 11.81567

## [15] 13.08882 12.20185 11.61276 12.87525 14.14387 12.07591

zs[1:20]

## [1] 1.4753066 -0.7673684 1.2390558 -0.5040782 0.6638640 -0.3054869

## [7] -0.7097772 -0.6467008 0.5178869 0.3653603 2.0613140 -1.0469146

## [13] 0.6138643 -0.4469423 0.6140196 -0.1251228 -0.6160363 0.4360388

## [19] 1.4932217 -0.2300782

# how often did we end up in each tail?

checkL = lowers>10

checkL[1:20]

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

checkU = uppers<10

checkU[1:20]

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

mean(checkL)

## [1] 0.0261

mean(checkU)

## [1] 0.0232

Interpret these two results.

which(checkL==1)

## [1] 11 101 102 133 194 197 332 371 403 500 517 522 532 576

## [15] 620 642 680 695 702 712 748 836 841 924 999 1052 1075 1103

## [29] 1123 1143 1175 1239 1358 1363 1383 1392 1489 1514 1578 1602 1608 1643

## [43] 1661 1705 1714 1736 1752 1771 1813 1833 1844 1871 1879 1889 1921 1946

## [57] 1947 2068 2076 2115 2138 2197 2286 2323 2392 2448 2472 2495 2591 2634

## [71] 2654 2673 2697 2710 2715 2757 2827 2852 2967 3027 3033 3218 3232 3240

## [85] 3305 3341 3362 3421 3434 3452 3489 3499 3546 3568 3571 3579 3596 3606

## [99] 3699 3737 3741 3804 3877 3945 3950 4013 4019 4033 4045 4098 4165 4188

## [113] 4209 4226 4242 4268 4288 4291 4292 4294 4316 4352 4462 4469 4478 4542

## [127] 4547 4555 4574 4597 4599 4605 4624 4645 4657 4658 4774 4777 4799 4835

## [141] 4869 4881 4920 4961 5030 5049 5093 5101 5118 5138 5142 5233 5238 5239

## [155] 5303 5394 5447 5457 5474 5479 5556 5578 5604 5611 5674 5693 5703 5790

## [169] 5896 5909 5913 5922 5977 6134 6179 6232 6294 6299 6307 6435 6468 6470

## [183] 6472 6486 6549 6556 6611 6621 6622 6675 6751 6792 6797 6826 6834 6877

## [197] 6887 6906 6997 7145 7199 7280 7317 7351 7383 7412 7414 7427 7514 7595

## [211] 7616 7679 7689 7720 7744 7745 7757 7784 7847 7940 8031 8084 8188 8291

## [225] 8388 8526 8543 8604 8608 8748 8771 8792 8843 8891 9001 9017 9056 9072

## [239] 9079 9087 9194 9257 9261 9293 9309 9319 9335 9514 9579 9614 9629 9692

## [253] 9705 9784 9815 9864 9884 9933 9960 9964 9982

which(checkU==1)

## [1] 34 66 185 199 213 217 251 279 283 361 439 458 489 579

## [15] 585 651 677 729 735 798 812 843 914 927 953 1056 1126 1135

## [29] 1138 1207 1208 1274 1407 1431 1472 1495 1552 1574 1710 1732 1772 1777

## [43] 1785 1861 1958 1990 2083 2123 2125 2201 2211 2216 2418 2514 2599 2604

## [57] 2617 2627 2640 2659 2675 2805 2970 3048 3112 3180 3200 3238 3263 3371

## [71] 3399 3498 3511 3518 3595 3607 3660 3735 3746 3913 3916 4054 4074 4080

## [85] 4163 4255 4256 4269 4270 4286 4326 4419 4480 4527 4544 4575 4576 4581

## [99] 4604 4607 4626 4710 4717 4719 4791 4807 4814 4824 4889 4949 5008 5036

## [113] 5040 5077 5111 5155 5171 5307 5324 5464 5471 5524 5546 5587 5608 5624

## [127] 5633 5686 5724 5796 5912 5974 6028 6064 6089 6110 6137 6149 6188 6250

## [141] 6297 6301 6308 6456 6518 6557 6570 6573 6605 6614 6647 6684 6695 6721

## [155] 6770 6794 6888 6951 6975 7008 7091 7117 7123 7144 7147 7148 7219 7226

## [169] 7233 7262 7268 7273 7297 7304 7365 7518 7530 7535 7590 7642 7693 7759

## [183] 7802 7807 7816 7838 7919 7974 8026 8046 8135 8145 8167 8183 8189 8198

## [197] 8217 8247 8267 8288 8298 8315 8353 8377 8414 8548 8571 8577 8655 8671

## [211] 8777 8943 8992 9006 9031 9145 9204 9273 9313 9323 9459 9470 9471 9526

## [225] 9684 9727 9761 9779 9796 9813 9932 9937

Pick one of the outcomes for which the variable checkL equals 1, and explain

this outcome using the data for that particular outcome.

Do the same for one outcome where checkU equals 1, and then for one outcome

where checkL=checkU=0.

6. Suppose you were going to run all of the code in Problem 5 over again,

making just one change: the original value for μ (mu in the program)

is now set to 9.

Leave mu0 and every other setting unchanged. Which results from the simulations

do you expect to change; how do you expect the results to change, and why?

7. Returning the value of μ to 10, now you are considering running Problem

5 over again but with a new value for, setting it to 4 instead of 6.

Which results from the simulations do you expect to change; how do you

expect the results to change, and why?

8. Now run the simulations that are appropriate for problems 6 and 7 and

verify your predictions.

9. Returning to the original μ = 10 and = 6, you change mu0 to 12. What

happens to your results? Interpret your findings.