代做Applications of Econometrics Spring 2024帮做R程序

2024.03.24 - 首页 >> C/C++编程

Applications of Econometrics

Assessed Group Project

Spring 2024

Due: 4pm, Thursday 21 March 2024

This project has two parts. In the ﬁrst part we study the problem of forecasting average wages from a time series perspective. The wage level in an economy is an important macroeconomic variable because it is informative on the cost of labour and factors into ﬁrms’ decision making on investment and capital allocation. In the second part we study labour supply elasticities from a panel perspective. Labour supply elasticities are a key concept in economics and e.g. used to predict the impact of public policies. As you will see below we will approach these topics with a focus purely on applying the empirical methods we cover in this course. You are not expected to read up on the state-of-the-art methods to e.g. estimate labour supply elasticities.

In the ﬁrst part of the project we use FRED data. In the second part we use the Survey of Income and Program Participation (SIPP). To prepare the datasets for analysis please see the Sections ’Preparing the Time Series Data’ and ’Preparing the Panel Data’ below.

• Groups have to submit a word/pdf ﬁle that has answers to the questions below along with a doﬁle that has all the commands in it that the group used.

• Both the word/pdf document and the doﬁle have to be submitted before the deadline. Projects sub- mitted without a doﬁle will incur the default penalty of a late submission.

• Answers to questions should be limited to 3 pages per question (1-2 pages is likely enough). Question 3 consists of 3 subquestions so 9 pages total. The entire project paper should not exceed 21 pages. This is a maximum, not a guideline. Font size between 10pt and 12pt is ok. Page margins, line spacing etc are up to you.

• The doﬁle should be written in such a way that anyone with access to the raw data ﬁles can replicate the analysis.

• Stata outputs (tables/ﬁgures) have to be included in the document. It is not enough to refer to outputs that are only included in the Stata log/doﬁle. If a result isn’t shown in the pdf/word document it doesn’t count.

• R is allowed (replace the word ’doﬁle’ with ’R script’) and we have tutors who know R, but in general it will probably be easier in Stata because all the lab materials are in Stata and that’s what most of the teaching staﬀ use. So feel free to use R but don’t expect equal levels of support. You can import Stata ﬁles into R using the ’foreign’ package.

• Wherever possible try to convert raw Stata/R output into a nice looking table/ﬁgure. Regression output can be converted to a table using e.g. outreg2. You will have to install those programs ﬁrst, e.g. by doing ’net install outreg2’ and then you can get help on how to use the command with ’help outreg2’ .

• Before submission groups have to declare that the project is their own work. There is no separate form. to complete, it can be done directly on Learn.

• Make sure that you are aware of the requirements for appropriate citation of references and data sources. Read the guidance on plagiarism in Section 4.4.1 of the Economics Honours Handbook and/or the general University guidance. If you include anything from another source it must be properly acknowledged, whether it’s a ﬁgure/table or a text passage or anything else.

• You are welcome to ask questions on piazza or come to helpdesks. We will try to help as much as possible with data preparation and Stata commands and are of course happy to clarify where things are unclear. We will generally not answer questions along the lines of ’is it correct/enough if I do x’ or ’how do I do x’ unless it is a speciﬁc technical question. We aim to be fair to all students.

Time Series Questions

For this part we use two main time series covering the U.S.: wages (hourly earnings) and labour turnover. These are available from FRED at the monthly level from 2006 to 2023. The basic goal will be to forecast wages using turnover data. This is a standard problem for forecasters since both workers and ﬁrms are very interested in knowing how wages will grow. Labour turnover is one important variable to make these forecasts. It is one of the key variables capturing the dynamics of the labour market. We recommend using levels (not logs) of both variables for simplicity (logs lead to complications when forecasting).

(1) Plot the time series for wages and turnover over time. Make sure you label the axes correctly. Test whether trends and seasonality are present and discuss your ﬁndings both in terms of what is visible in the ﬁgures and what you ﬁnd through your tests. [10 points]

Hint: You can either plot the two series separately or combine them into one ﬁgure. If you combine them make sure to have two separate y-axes. Note that the FRED data are adjusted for seasonality, so our test here is mostly a test whether their adjustment worked.

(2) Investigate whether wages and turnover likely have a unit root or not. Discuss your ﬁndings. In particular, explain what can be done if we want to use them in regressions if they are not stationary (don’t forget to incorporate your ﬁndings from (1)). [10 points]

(3) Try to build a model that can be used to forecast wages incorporating your ﬁndings from (1) and (2). A starting point might be

waget = β0 + β1turnovert- 1 + β2 waget- 1 + ut

but this potentially has to be adjusted for trends and unit roots depending on your ﬁndings. [30 points] Hint: Note that we don’t include turnover in t in this regression because then we can’t eaily make a forecast without ﬁrst forecasting turnover. You can also do this, there’s no need to separately forecast turnover in this question (it’s ﬁne to use one-step-ahead forecasts) . This also means we are not interested in a VAR, we only want to forecast wages. You can ignore serial correlation in the error term. Also note that we don’t expect you to write a dissertation on this question. It is ok to keep it simple. E.g. testing for 3-5 lags is ﬁne.

(a) Use in-sample criteria (e.g. R-squared, adjusted R-squared) to decide which is the ’best’ model (e.g. how many lags). Explain your results.

(b) Use out-of-sample criteria (e.g. RMSE, MAE) to decide which is the ’best’ model (e.g. how many lags). Explain your results.

Hint: To do this you have to decide which part of the sample you want to use to estimate the parameters of the model, and which part to use for evaluating the forecasts. One way is to use everything except the last year for estimation, e.g. by adding ’if year(dofm(monthly_date)) < 2023 ’ to your regression commands. You can then calculate the forecast errors for all ob- servations in 2023 and summarise them using RMSE or MAE, e.g. let’s say your predictions (one-step-ahead forecasts) for 2023 are stored in the variable ’ f ’. Then the forecast errors can be obtained with ’generate e = wage - f if year(dofm(monthly_date)) == 2023 ’. To get the RMSE we’d have to square them, take the average, then take the square root of the average.

(c) Decide which model (a or b) you think is best for forecasting and brieﬂy explain why. Using this model calculate the point forecast for the wage in the ﬁrst month after the sample period (January 2024 in our data) as well as the 95% forecast interval. Discuss the sources of uncertainty in this forecast.

Panel Questions

For this part we use the SIPP panel dataset at the individual level. We want to study labour supply elasticites, i.e. the eﬀect of a change in wages on hours worked. Time is measured in months and the panel entity is an individual respondent. You can ﬁnd our prepared dataset ’part2_panel. dta’ on Learn. Throghout we focus on ’prime-age’ individuals, i.e. ages 25-54. To prepare the data yourself see the Section ’Preparing the SIPP Data’ below. Going over this is necessary if you want to add additional variables and could be helpful to understand how the variables are constructed and what they measure.

(4) Provide some descriptive statistics for your sample, such as the mean, minimum, and maximum of key variables (wages, hours, age, etc). Make sure you provide clear indications of what you are reporting. This means do not include the raw variable names in the table. Instead, use a descriptive label like ’hourly wages in $’. Then estimate the labour supply elasticities for women by pooled OLS (POLS) and interpret your results. We usually do this by regressing log hours on log wages. Run this regression once without controls, once with controls and compare them. [10 points] Hint: Include your own choice of control variables. Some suggestions: time trends, seasonality, age, education, marital status, whether there are children in the household. We provide simple to use education variables in the prepared data. They are called ’edu_lessthanhs’, ’edu_hs’ and so on. It also makes sense to account for seasonality here. The SIPP variables are not already de-seasonalised. Just in terms of terminology: ’Regressing log hours on log wages’ means log hours is the dependent variable and log wages is a regressor.

(5) Estimate the labour supply elasticities for women using ﬁrst diﬀerences and ﬁxed eﬀects and compare your estimates to the POLS results in (4), explaining why they might be diﬀerent. For this comparison to be meaningful it makes sense to include the same controls as far as possible. Discuss which estimates we likely trust most. [10 points]

Hint: To be able to comment on which estimates we trust most it makes sense to check for serial correlation in the error term to be able to say something about eﬃciency (and not just bias/consistency) .

Preparing the Time Series Data

In this section we provide basic instructions how to download the datasets and make them ready for analysis. First you have to download the FRED data on wages and turnover. You need these three series:

• Average Hourly Earnings of All Employees, Total Private https://fred.stlouisfed.org/series/ CES0500000003

• Hires: Total Nonfarm https://fred.stlouisfed.org/series/JTSHIR

• Total Separations: Total Nonfarm https://fred.stlouisfed.org/series/JTSTSR

We recommend downloading them in CSV format. You can then import them into Stata using e.g. code like this:

// Set path where Stata dataset will be stored

global datapath "C:\Desktop\AofE"

// Change to the folder where you downloaded the CSV data

cd "C:\Users\AofE\Downloads"

// import csv of wage data into Stata

clear

import delimited CES0500000003.csv

rename ces wage

label var wage "Average Hourly Earnings of All Employees, Total Private"

// save the dataset

compress

save "$datapath/wages", replace

// import csv of hires data into Stata

clear

import delimited JTSHIR.csv

rename jtshir hires

label var hires "Hires: Total Nonfarm"

// save the dataset

compress

save "$datapath/hires", replace

// import csv of separations data into Stata

clear

import delimited JTSTSR.csv

rename jtstsr separations

label var separations "Total Separations: Total Nonfarm"

// save the dataset

compress

save "$datapath/separations", replace

This gives us three Stata datasets containing the three FRED series. We can then merge them together and create our turnover variable as the sum of hires and separations using code like this:

// merge the FRED data together

use "$datapath/wages", clear

merge 1:1 date using "$datapath/hires", nogen keep(match)

merge 1:1 date using "$datapath/separations", nogen keep(match)

// create turnover variable

g turnover = separations + hires

// create time indicator

g monthly_date = mofd(date(date,"YMD"))

format %tm monthly_date

sort monthly_date

// declare time series data

tsset monthly_date

// keep up to December 2023

keep if monthly_date <= ym(2023,12)

compress

save "$datapath/part1_timeseries", replace

This gives us a suitable dataset (part1_timeseries.dta) to conduct the time series analysis. Because we declared it as a time series dataset we can now use time series operators to create diﬀerences and lags, see help tsvarlist.

Preparing the Panel Data

You can use our prepared dataset on Learn (part2_panel.dta) and skip this section. But if you’re interested in creating your own exctract, for example to add additional control variables, or if you want to understand how we created our wage, hours and education variables then this is for you.

The SIPP is a household panel dataset with detailed information for a sample of U.S. households. It is representative for the U.S. population and has been used in many applied research projects. You can ﬁnd all the raw datasets at https://www.census.gov/programs-surveys/sipp/data/datasets.html. These datasets can be very big so you might have to use some tricks to be able to even open them in Stata (for example by specifying the variables you want to import when using ’use’). Each ﬁle (wave) contains 12 months of a year, so we have the same person roughly 12 times per wave.

In a ﬁrst step we simply open the dataset, convert all variable names to lower case, and keep only the variables we want. Then we generate a variable that contains the year covered by the survey wave (the wave released in 2022 covers questions asked about 2021). Then we compress and save. Here’s the example for the ﬁrst wave in 2018:

// Set your own working directory

cd "/home/data"

// Type in path/folder where you downloaded the dataset to

global datapath "US-SIPP"

//====================================

// Load SIPP waves

//====================================

// Prepare 2018

use "$datapath/pu2018", clear

rename *, lower

// if you want to add more variables (e.g . to add other controls) then add them here

keep eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems erp spanel ssuid erace rmesr ///

tage eeduc edisabl ehltstat emd_scrnr emc_scrnr epr_scrnr efree_lunch edaycare tutils tosavval pnum /// tjb1_occ tjb1_ind ejb1_scrnr eafnow monthcode tpearn tmwkhrs rwksperm tage rmwkwjb ///

twkhrs1-twkhrs5 rpubmth rpubtype2 rpritype1 wpfinwgt rsnap_mnyn ems_ehc rprimth

// Generate reference year

// the survey released in year x covers the observation period x-1

gen refyear = 2017

lab var refyear "Year which the wave refers to"

// keep only prime-age workers

keep if tage >= 25 & tage < 55

// compress to save space

compress

save "$datapath/pu2018_prime", replace

If you want to add additional variables a helpful command is lookfor. This searches through the labels to ﬁnd a search term. For example, you could ﬁnd all variables that have ’children’ in the label by using ’lookfor children ’.

Once we have imported all years we have to assemble them into one dataset. Check out our doﬁle ’prepare_panelpart.do’ to see how this is done. We save the assembled dataset as ’part2_panel.dta’. With this we can start creating our own variables. For example, here’s how we create the wage variable:

// generate a wage variable based on total earnings and hours of work

// Due to measurement error we usually don’t use the reported hours but just look

// at full-/part-time when we’re interested in labour supply elasticities

g ftpt_hours = .

replace ftpt_hours = 0 if tmwkhrs == 0

replace ftpt_hours = 20 if tmwkhrs > 0 & tmwkhrs <= 25

replace ftpt_hours = 40 if tmwkhrs > 25 & tmwkhrs < .

// Divide total monthly labour earnings by weeks worked times normed hours

g wage = tpearn / (ftpt_hours*4*rmwkwjb/rwksperm)

Check out our ’prepare_panelpart.do’ code for how we created other variables.

To work with the panel data we need to create a unique person id that lets Stata know what the panel unit is. You could do this as follows.

egen id = group(ssuid pnum)

g monthly_date = ym(refyear,monthcode) xtset id monthly_date

Finally, adding additional variables or determining what the codes correspond to can be a bit tricky. We

show you an example for how to generate a dummy for ’married’ here. First we need to ﬁnd any variable that has ’married’ in the label:

lookfor married

> storage display value

>variable name type format label variable label

>---------------------------------------------------------------------------

>ems byte %12.0g Is . . . currently married, . . .

tab ems

> Is . . . |

> or never |

> married? | Freq . Percent Cum .

>------------+-----------------------------------

> 1 | 502,104 54.07 54.07

> 2 | 17,628 1.90 55.96

> 3 | 11,412 1.23 57.19

> 4 | 113,676 12.24 69.43

> 5 | 24,576 2.65 72.08

> 6 | 259,308 27.92 100.00

>------------+-----------------------------------

> Total | 928,704 100.00

Then we need to ﬁnd out what ’1’, ’2’ etc correpond to. You can ﬁnd this in the SIPP Codebook available

on the Census Bureau SIPP homepage. Here’s the entry for ’ems’:

Now we are ready to label the ems values and create a dummy for ’married’.

label define ems 1 "1 . Married spouse present" 2 "2 . Married spouse absent" 3 "3 . Widowed" /// 4 "4 . Divorced" 5 "5 . Separated" 6 "6 . Never married"

label values ems ems

tab ems

>Is . . . currently married, |

> widowed, divorced, |

> separated, or never |

> married? | Freq . Percent Cum .

>--------------------------+-----------------------------------

> 2 . Married spouse absent | 17,628 1.90 55.96

> 3 . Widowed | 11,412 1.23 57.19

> 4 . Divorced | 113,676 12.24 69.43

> 5 . Separated | 24,576 2.65 72.08

> 6 . Never married | 259,308 27.92 100.00

> +-----------------------------------

> Total | 928,704 100.00

g married = ems == 1 | ems == 2