代写Quantitative Research Methods in Finance代写Java编程
- 首页 >> Matlab编程Quantitative Research Methods in Finance
Final Project
Due: 2pm 29th of April, 2024
Submission via LEARN AND Turnitin
(max number of words are given in each question)
For this assignment you will be given a dataset covering publicly listed firms in the US for the period 2006 – 2019. The data excludes firms from the financial sector, regulated utilities, conglomerates and tobacco firms. The data sources are CRSP, Compustat, and Refinitiv. Each student has been assigned 3 industries, based on GIC industry classification, as specified in the Appendix to these instructions.
The goal is to perform statistical analysis of the relationship between ESG (Environmental, Social and Governance) scores and firm risk measured by Systematic Risk, Idiosyncratic Risk or Total Risk (see Appendix to see which one to use). You will need to download monthly data from CRSP via WRDS, construct the measure of risk assigned to you (see Appendix B) and merge this with the dataset provided. A recent review article of the existing literature examining this topic is Gillan, Koch and Starks (2021). Students are expected to study the article and find further reference articles examined therein to substantiate their empirical design choices.
The execution of the project requires a literature review, justification of empirical design, data management, statistical analysis, and discussion of results. Tables and figures need to be self- explanatory (readers must understand these just by looking at the table/graph and referring to the associated notes beneath, without needing to refer to the body of the text).
You are required to submit two files: (1) a final report (in pdf format) and (2) a do file containing your Stata code.
Name the do file as “ExamID_QRMF_Code.do” and the pdf file as “ExamID_QRMF_Report.pdf”. For instance, if your exam ID is B12345678, the do file should be named as “B12345678_QRMF_Code.do” and the report file as “B12345678_QRMF_Report.pdf”. The do file should be self-explanatory as to which lines display the outputs for a particular question.
An example of codes in a do file is as follows (note that “*” is the function for “comment-out” in Stata):
***************************
*Question 1 – Summary Statistics
***************************
Code here
****************************
*Question 2 – Correlation Analysis
****************************
Code here
Marking Criteria:
Students are required to use the statistical software package covered in the course, i.e. Stata. Projects using methods not covered in the course will not be given marks. Answers exceeding the maximum word count provided after each question will only be marked up to the respective word limit. Variables must be reported and referred to with consistent names and labels throughout the project.
This is an individual assignment. Similarities between projects will be severely penalized. Originality and creativity will be rewarded. Each student will work with a unique subset of data and therefore their statistical results will differ. The discussion questions refer to the unique numerical results of each student and the answers should point explicitly to the corresponding analysis, numbers and tables.
You will estimate the following baseline regression model with variations in methodology and control variables in the different tasks below:
y = Yo + βoxo + δoz + δ1xo * z + Σ βjxj + Σ βlxl + u
Where y is one of the dependent variables Systematic Risk, Idiosyncratic Risk or Total Risk;
xo is a continuous explanatory variable of interest capturing an aspect of ESG; only choose one of three aspeces of ESG
z is an indicator variable denoting the three industries assigned to you;
xj are control variables appropriate for the regression equation to be decided by you (note the minimum number is 2 and based on your literature review and data availability you will decide how many control
variables to include) and
xl are year dummies.
When carrying out the project clearly write down the number of the question/task you are answering/performing and follow a numerical order. If numbers are not provided for a question, zero
marks will be allocated to that question.
Study the review article by Gillan, Koch and Starks (2021), choose your ESG variable of interest: E, S or G and select between one and three other articles cited therein to be your guiding references for this
project. List them in your reference section. Note that a number is not attached to the list of references.
do not show how much you know,just answer the questions directly.
1. Data Section
1.1 Write down the research question (RQ) you will study. For example, if the continuous explanatory variable of interest were the overall ESG score of a company, then theRQ will be: Is firm risk explained by overall ESG scores?
Only one research question and one hypothesis. [5 marks, max 20 words]
1.2. Write down the testable hypothesis corresponding to the RQ and the regression equation above. Include both the null and the alternative hypothesis. Carefully justify whether the alternative hypothesis is single- or double-sided. need literature review
You're going to have to use existing theory and existing published work that tells us what the relationship, what the
mechanism is of how certain sources of ESG maybe driving firm risk. And you in this justification, you will show us how
much in depth, how well you have performed an in-depthliterature review.
1.3. Construct a table of variable definitions that you will use in your regression analysis “Table 1: Variable Definitions”. An example table of variable definitions is given in the Appendix. Add one more column in which you will specify the reference article(s), which provides guidance for each variable that belongs in the regression analysis and its definition. Only provide the table. Characters in the table do not count towards the maximum number of words.
[10 marks, max 0 words]
1.4. Produce a table of summary statistics of the raw variables that you will use to construct the final variables that will be used in your regression. Name this table “Table 2: Summary statistics of raw variables; Panel A Full sample” and include the following information: mean, standard deviation, minimum, median, maximum and number of non-missing observations of each variable. (Year dummies should not be included in Table 2). Now add 3 panels (B, C and D) to this table where you will produce the same descriptive statistics than in Panel A separately for each of your assigned industries. Report the number of unique firms in each Panel, in addition to the non-missing yearly observations for each variable. Copy-paste the lines of code in Stata you used. Only provide the table and command. Characters in the table do not count towards the maximum number of words.
in order to decide which are the raw variables that belong in this table, you need to know what will be the variables that will go inyour regression. So you first need to make a decision of what your control variables are going to be and what you need to construct them. And based on that, this is going to be your raw data that you will use later. So in the raw data, we only want the datayou will use later. So you first need to know what your variables are in order to go back and pick out the raw variables that you will be used in the regression. So the raw variable data
1.5. Examine the data for outliers and potential errors. Perform. appropriate cleaning and data management and briefly explain why and how you performed these steps. Complete the following table and include in the Appendix any graphs or tables that you found helpful in this task. Characters in the table do not count towards the maximum number of words.
Sample Requirements |
Unique Firms |
Firm-years |
Initial sample |
|
|
Less observations: |
|
|
No ESG information |
|
|
Missing control variable X1 |
|
|
Missing Control Variable X2 |
|
|
…. |
|
|
Missing Control Variable Xk |
|
|
Other (e.g. deleted observations) |
|
|
Final Sample |
|
|
[10 marks, max 200 words]
1.6. Based on the variable definitions in task 1.3 construct your regression variables using the final sample in Question 1.5. Report a table (“Table 3: Summary statistics of regression variables”) of summary statistics of the constructed/transformed variables that will be used in your regressions. Report the same information as in Table 2 for the entire sample and for each industry (using different panels). Characters in the table do not count towards the maximum number of words.
[10 marks, max 0 words]
1.7. Report four correlation matrices among the regression variables (Table 4. Correlation matrices) – one for the entire sample and three more for each industry. Adjust the formatting accordingly to make the matrices clearly readable. Note any correlations that differ substantially among the industries, offer potential explanations. Characters in the table do not count towards the maximum number of words.
These explanations have to do with the nature of the business of your industries and have to make sense.
You must explain why leverage is lower in this industry than other, why tangible assets are lower or higher in this industry than other, and it should make sense. One industry is talent based industry, and another one is oil and mining and transportation or something.
[10 marks, max 100 words]
2. Regression Analysis Section
2.1. Report the following three regression specifications in a single table (Table 5. Regression
analysis). Perform regression analysis with clustering by firm. Show the Stata commands that produce them.
Column (1) no interaction
y = Yo + βoxo + δoz +Σ βjxj +Σ βlxl
Column (2) with interaction
y = Yo + βoxo + δoz + δ1xo * z + Σ βjxj + Σ βlxl
Column (3) no interaction and firm fixed effects – estimate the specification in column (1) using the FE estimator.
Column (4) with interaction and no clustering – estimate the specification in column (2) but do not cluster.
Characters in the table do not count towards the maximum number of words.
[10 marks, max 100 words]
2.2. Provide statistical and economic interpretation of the coefficient corresponding to your testable hypothesis in task 1.2. How do these interpretations change across columns (1) through (4) of Table 5? A statistical interpretation states whether the null hypothesis is rejected or not at a given significance level. Refer to the p-value and clearly state the significance level that you are using (e.g. the null hypothesis is rejected at 1, 5 and 10%; p-value is 0.001). An economic interpretation shows the magnitude of the coefficient relative to an informative summary statistic and makes a judgement of the relative size of the effect. Organize your answer in the table corresponding to the 4 columns of Table 4 and the two required interpretations. Report “same as in column (X)” in case the interpretation does not change. You do not need to write anything outside this table. Characters in the table DO count towards the maximum number of words.
[10 marks, max 200 words]
|
(1) |
(2) |
(3) |
(4) |
Statistical interpretation |
|
|
|
|
Economic interpretation |
|
|
|
|
2.3. Provide two examples of plausible omitted variables in the regression analysis. One of the two must be unobservable. Assume away all control variables and deduce the potential sign of any misestimation based on Table 3.2 in Wooldridge. Organize your answer in a table corresponding to the two examples as follows. You do not need to write outside this table. Characters in the table do not count towards the maximum number of words.
|
Omitted Variable w |
COTT (xo, w) |
Sign of betao |
Direction of bias |
Ex 1 |
|
|
|
|
Ex 2 |
|
|
|
|
[10 marks, max 0 words]
2.4. Among the two examples in task 2.3, which one is less likely to be a problem in the FE estimation in column (3). Why?
fixed effect
It will depend on the two examples you've picked. So this answer will be
entirely dependent on your answer to the previous question.
And because it's examples that you have to come up with, whatever
examples you have, one of them will be. More or less of a problem.
When you when you run a fixed effects regression, and I cannot say which
one depends on what your two examples are.
[10 marks, max 200 words]
Notes:
Except for Table 1, all tables need to be produced in Stata. Hint: use the esttab command. Images of the tables are acceptable.
If the instructions only ask you to include a table, you do not need to provide any further discussion for that task.
You should use Times New Roman 11 point font on A4 pages with 2.5 margins from each side. An absolute maximum of 1700 words +/- 10% not including any tables and/or graphs, references or appendix. Make sure to include the word count at the beginning of your write-up.
Make sure that the names you give to your variables are consistent across the report. For example, if you refer to Log(Assets) as SIZE, then your regressions should show the coefficient for SIZE and not for Log(Assets).
You will need to submit the do file(s) that replicate your Tables 2 through 5. These files are not part of the word count.