Economics 142讲解、讲解programming language、辅导R程序语言、辅导R

- 首页 >> 其他


Economics 142 Problem Set #1

Due Thursday January 31

Note: It is recommended that you use the R programming language for assignments in

this class. You are welcome to use other programming languages (e.g., Python) or

software packages (e.g., Stata). However, the problem sets will contain instructions

assuming you are using R.

For problem sets that require coding (probably all of them), you are required to submit

your reproducible code separately along with a write up with answers to the questions.

We will discuss the online submission procedure in section this week.

1.Set up an R program to conduct a simulation of drawing a sample size of n from a

Bernoulli distribution with mean p.In each "replication" you will draw a sample,

construct the estimated mean from that replication (which we will denote as Y r ), and

calculate the 95% confidence interval (Y s n ,Y + s n ) nr r r r 1/ 2 1/ 2 1.96 / 1.96 / where is

the estimated standard deviation in that replication, ( (1 )) . 1 / 2 r r sr In each

replication, record the length of the confidence interval, and whether or not the true

mean is inside the interval.

For given (n, p), conduct 1,000 replications and report the following statistics:

a) the mean estimate of p

b) the mean estimate of the true standard deviation

c) the fraction of time that the confidence interval contains the true p.This is called the

"coverage rate"

Conduct the analysis for the cases n=30 using p=0.05 and p=0.25, and again for n=60

using p=0.05 and p=0.25??(a total of 4 cases).It is often claimed that n of 30 or larger is

enough to ensure that asymptotic confidence intervals work well. Do you agree or not?

2. Go to the course bcourses page and download the data set "ps1.csv" (Files‐> ps1).

This is a simulated data set based very closely on real data, giving a new born baby’s

weight (in grams) and his/her mother’s weight before pregnancy (in pounds) and her

height (in inches).It has 48,871 rows (one for each baby) and 3 columns.We are going

to develop some descriptive graphs using these data.

a) Imagine you want to convey to a non‐statistical reader the relationship between

mother’s weight and baby’s weight.Develop the following graphs

(i) find the deciles of mother’s weight. Then put each mother in a decile, and get the

mean birthweight of babies for mothers in each decile.Plot the mean baby weight

against the mean mother birthweight in the decile. This is called a “binscatter” plot in

economics.

(ii) in addition to the mean baby’s weight for each decile, construct the 10, 25, 50, 75

and 90 percentiles.Plot all these against mean mother’s weight for mothers in each

decile.Do you see any interesting pattern?

(iii) instead of using 10 deciles of mother’s weight, redo the analysis in (i) using the

individual (1‐pound) values of mother’s weight.How does this compare to the decile

(bin‐scatter) plot?

Problem Set 1, continued

2 b)Divide mothers into 5‐pound buckets, starting at 95 pounds. Calculate baby

weights separately for 4 groups of mother’s within each bucket:

momheight≤60 inches

momheight between 61 and 63 inches

momheight between 64 and 66 inches

momheight≥67 inches

Now plot the mean baby weights for the four groups against the mid‐point weight in the

bucket.(So if your first bucket is 95‐99 points, the midpoint is 96.5). What can you

conclude about how mother’s weight and height affect baby’s weight?

2c)Think about how to construct a 3‐dimensional bin‐scatter.For example, you can put

mother’s weight and height into 25 bins (5 x 5) and plot mean baby’s weight on a 3‐d

graph.Try to be creative!