辅导program编程、R留学生程序调试、R编程讲解 调试Matlab程序|讲解Database
- 首页 >> Matlab编程 Transport Sociology and Psychology (LV 240834759)
Take-Home Exam for Part ‘Transport Sociology’
Department of Civil, Geo and Environmental Engineering at the Technical University Munich
Release Date: 9 February 2021
Due Date: 17 March 2021, end of day (CET)
Introduction
The following tasks shall be answered in a written report. The recommended software
package to calculate answers in Task 1 and Task 2 is ‘The R Project for Statistical
Computing’. You may, however, use other software packages that you might be more
familiar with (such as Matlab or Biogeme), or you write your own code in any language
of your choice. While there is no word limit, answers should be rather short with one to
two paragraphs per question.
Submit your report as a PDF document. Please add your name and matriculation
number (03xxxxxx) at the beginning of your report and specify which software was
used to answer Tasks 1 and 2. Do not provide the script or code you wrote. Once you
are done, upload your PDF report to Moodle.
You are allowed to work in teams to solve these tasks. Note, however, that each
student needs to submit an individual solution. The final estimation results are likely
to be different for every student, as there are thousands of right answers for many
tasks. Provide your own solution. Also, all text needs to be written in your own words,
using copy-and-paste will result in failing the exam and a report to the examination
board. To acknowledge these rules at TUM, you were asked to sign the “Pledge
against Plagiarism” that you find on Moodle. Some of you have handed in a signed
copy already. If you have not done so already (or if you are unsure whether you already
did), please sign this document an upload it with your take-home exam report.
Task 1 [35 points]
You have been provided with a household travel survey that provides information on
household characteristics and the number of trips reported. The Excel file
(householdTravelSurvey.xlsx) provides a description of the available variables. The
CSV file contains the same data and was provided to be read in R. In this task, you
shall identify the most important socio-demographic attributes that explain the number
of auto trips.
a) Read the data with R (or your language of choice). To understand the range of
data, provide min, max and mean values for each variable of this dataset in your
report. [2 points]
b) Create a histogram for number of auto trips and a boxplot for income. Copy the
two graphics into your report. Describe the two graphics in two to three
sentences in your report. [2 points]
c) Estimate a multiple regression, where you try to explain the number of trips by
car with all other socio-demographic attributes available in this survey. Provide
the estimation results in the report*
. [6 points]
Describe the estimation results in the report:
2
× Which independent variables are statistically significant with a
confidence level of at least 90%?
× Are estimated coefficients (called ‘Estimate’ in R) reasonable? Or did you
find coefficients that do not make sense to you? Name coefficients that
seem unlikely and explain why you think they don’t seem right from a
theoretical point of view.
d) A possible reason for unreasonable coefficients is multicollinearity. Use R (or
your preferred software) to plot the correlation between all variables. Add this
plot to your report and identify the three pairs of independent variables that are
most correlated. [5 points]
e) Create another multiple regression with auto trips as the dependent variable.
This time, select independent variables that lead to an estimate where all
coefficients:
× are statistically significant (here defined as 90% confidence or more),
× have signs (+ or –) that make sense to you, and
× no two independent variables correlate with more than |R| = 0.6
This will require some trial and error. Provide and briefly describe the final
estimate in your report*
. Explain for each independent variable in your final
estimation why it makes sense to you (i.e., explain why every + and – sign is
reasonable). [10 points]
f) Attempt to improve the estimation result further by removing the intercept [-1],
by using a quadratic transformation [I(variable^2)], by using a logarithmic
transformation [log(variable)] and by testing interactions [variableA*variableB]
for selected independent variables. This may require to drop additional
independent variables to ensure that all estimated coefficients are statistically
significant. The same three rules listed under the bullet points of subtask (e)
shall apply. In your report, provide the final estimate* that provides the best
model fit that you can find. [10 points]
Task 2 [20 points]
You were provided with another dataset on mode choice
for long-distance travel (file modeChoiceData.csv, see
xlsx file for definition of variables). The survey data was
collected for long-distance trips between Sydney,
Canberra and Melbourne in Australia. Travelers had the
choice between auto, bus, train and air.
[Data Source: Greene, W.H. and D. Hensher: Multinomial logit and discrete choice models. In Greene,
W. H. (1997) LIMDEP version 7.0 user’s manual revised. Plainview, New York. Note that data were
modified for this exam.]
a) Read the data in R (or the software of your choice). To understand the range of
data, provide min, max and mean values of in-vehicle travel times for each mode
in your report. [2 points]
b) Estimate of a multinomial logit model, where mode is the dependent variable
and all other variables serve as independent variables. Provide the estimation
result in your report† and briefly describe whether these estimates make sense
to you (refer to statistical significance and describe whether + and – signs are
reasonable). [8 points]
c) The estimation under (2b) provides for WaitTime, InVehCosts, InVehTime and
GenCosts one coefficient each across all modes. Modify your estimation to
Auto Bus Train Air
Trip
3
provide mode-specific InVehTime (i.e., estimate a different coefficient for
InVehTime for every mode). Provide the estimation result in your report† and
briefly assess how using coefficients by mode has improved this estimation
(provide two reasons why estimation (2c) is better than estimation (2b)).
[4 points]
c) Try to further improve the estimation result from task (2b)
• by removing the intercept [-1] or
• by raising a variable to the power of two [I(variable^2)] for selected
independent variables or
• by using a logarithmic transformation [log(variable)] for selected
independent variables or
• by estimating mode-specific coefficients for InVehCosts, InVehTime or
GenCosts.
To ensure that all variables are statistically significant, you may have to drop
some independent variables. In your report, provide the final estimate† that
provides the best model fit that you can find. Make sure that your best model
estimation only includes independent variable that (i) have the expected sign
[+ or –] and (ii) have a 90% significance level or more. This will require some
trial and error. It is ok to include constants that do not reach this significance
level. [6 points]
Task 3 [15 points]
In task 2, you were asked to estimate a multinomial logit
model. Here, we explore a nested logit model instead.
a) Describe the reasons why nested mode choice
models sometimes work better than multinomial logit
models. There is no need to estimate a model. A
written description of the potential benefits of nested
logit models is sufficient. [7 points]
b) Create a nesting structure for the modes conventional
car, autonomous car, tolled road, non-tolled road,
walk, bike, e-bike, e-scooter, bus, tram and commuter rail. Use as many nesting
layers as make sense to you. Draw a nesting diagram (as shown in the diagram
above), label the boxes with modes and provide it in your report. Explain in one
paragraph your chosen nesting structure. There are many different solutions
that are plausible. While your nesting structure will not be evaluated, your
reasoning for your chosen nesting structure will be evaluated. [8 points]
Task 4 [20 points]
Task 1 explored multiple regression and Tasks 2 and 3 discrete choice models. In this
Task 4, we look at the differences between the two.
a) We apply multiple regression and discrete choice models for different problem
sets. Explain when to use which one. [4 points]
b) Could you have solved Task 1 with a discrete choice model? Why? [8 points]
c) Could you have solved Task 2 with a multiple regression? Why? [8 points]
Trip
4
Task 5 [10 points]
To explore travel behavior, both household travel surveys (e.g., MiD in Germany) and
panel surveys (e.g., MOP in Germany) have been conducted.
a) Explain the difference between a household travel survey and a panel survey in
terms of selection of participants and common sample sizes. [4 points]
b) For each of the following questions, select a survey (MiD or MOP) that is likely
to be most useful. Explain your choices in two or three sentences [6 points]
× Explain mode choice behavior for shopping trips of high-income
households with 0 workers and 0 cars.
× Explore if people who travel less on weekdays travel more on weekends.
× Explain how household relocation to the suburbs has affected the
likelihood to buy a car.
I appreciate any feedback you like to give on clarity, length and difficulty of this exam.
Also, it would be helpful if you could give an estimation of number of hours it took you
to complete this exam. Your answer is optional and will not affect your grade. Thanks!
* Please provide your estimation results including at least: Variable names, estimated
coefficients, statistical significance of each variable and R2 of the estimate.
† Please provide your estimation results including at least: Variable names, estimated
coefficients, statistical significance of each variable, as well as log-likelihood and R2 of
the estimate.
Take-Home Exam for Part ‘Transport Sociology’
Department of Civil, Geo and Environmental Engineering at the Technical University Munich
Release Date: 9 February 2021
Due Date: 17 March 2021, end of day (CET)
Introduction
The following tasks shall be answered in a written report. The recommended software
package to calculate answers in Task 1 and Task 2 is ‘The R Project for Statistical
Computing’. You may, however, use other software packages that you might be more
familiar with (such as Matlab or Biogeme), or you write your own code in any language
of your choice. While there is no word limit, answers should be rather short with one to
two paragraphs per question.
Submit your report as a PDF document. Please add your name and matriculation
number (03xxxxxx) at the beginning of your report and specify which software was
used to answer Tasks 1 and 2. Do not provide the script or code you wrote. Once you
are done, upload your PDF report to Moodle.
You are allowed to work in teams to solve these tasks. Note, however, that each
student needs to submit an individual solution. The final estimation results are likely
to be different for every student, as there are thousands of right answers for many
tasks. Provide your own solution. Also, all text needs to be written in your own words,
using copy-and-paste will result in failing the exam and a report to the examination
board. To acknowledge these rules at TUM, you were asked to sign the “Pledge
against Plagiarism” that you find on Moodle. Some of you have handed in a signed
copy already. If you have not done so already (or if you are unsure whether you already
did), please sign this document an upload it with your take-home exam report.
Task 1 [35 points]
You have been provided with a household travel survey that provides information on
household characteristics and the number of trips reported. The Excel file
(householdTravelSurvey.xlsx) provides a description of the available variables. The
CSV file contains the same data and was provided to be read in R. In this task, you
shall identify the most important socio-demographic attributes that explain the number
of auto trips.
a) Read the data with R (or your language of choice). To understand the range of
data, provide min, max and mean values for each variable of this dataset in your
report. [2 points]
b) Create a histogram for number of auto trips and a boxplot for income. Copy the
two graphics into your report. Describe the two graphics in two to three
sentences in your report. [2 points]
c) Estimate a multiple regression, where you try to explain the number of trips by
car with all other socio-demographic attributes available in this survey. Provide
the estimation results in the report*
. [6 points]
Describe the estimation results in the report:
2
× Which independent variables are statistically significant with a
confidence level of at least 90%?
× Are estimated coefficients (called ‘Estimate’ in R) reasonable? Or did you
find coefficients that do not make sense to you? Name coefficients that
seem unlikely and explain why you think they don’t seem right from a
theoretical point of view.
d) A possible reason for unreasonable coefficients is multicollinearity. Use R (or
your preferred software) to plot the correlation between all variables. Add this
plot to your report and identify the three pairs of independent variables that are
most correlated. [5 points]
e) Create another multiple regression with auto trips as the dependent variable.
This time, select independent variables that lead to an estimate where all
coefficients:
× are statistically significant (here defined as 90% confidence or more),
× have signs (+ or –) that make sense to you, and
× no two independent variables correlate with more than |R| = 0.6
This will require some trial and error. Provide and briefly describe the final
estimate in your report*
. Explain for each independent variable in your final
estimation why it makes sense to you (i.e., explain why every + and – sign is
reasonable). [10 points]
f) Attempt to improve the estimation result further by removing the intercept [-1],
by using a quadratic transformation [I(variable^2)], by using a logarithmic
transformation [log(variable)] and by testing interactions [variableA*variableB]
for selected independent variables. This may require to drop additional
independent variables to ensure that all estimated coefficients are statistically
significant. The same three rules listed under the bullet points of subtask (e)
shall apply. In your report, provide the final estimate* that provides the best
model fit that you can find. [10 points]
Task 2 [20 points]
You were provided with another dataset on mode choice
for long-distance travel (file modeChoiceData.csv, see
xlsx file for definition of variables). The survey data was
collected for long-distance trips between Sydney,
Canberra and Melbourne in Australia. Travelers had the
choice between auto, bus, train and air.
[Data Source: Greene, W.H. and D. Hensher: Multinomial logit and discrete choice models. In Greene,
W. H. (1997) LIMDEP version 7.0 user’s manual revised. Plainview, New York. Note that data were
modified for this exam.]
a) Read the data in R (or the software of your choice). To understand the range of
data, provide min, max and mean values of in-vehicle travel times for each mode
in your report. [2 points]
b) Estimate of a multinomial logit model, where mode is the dependent variable
and all other variables serve as independent variables. Provide the estimation
result in your report† and briefly describe whether these estimates make sense
to you (refer to statistical significance and describe whether + and – signs are
reasonable). [8 points]
c) The estimation under (2b) provides for WaitTime, InVehCosts, InVehTime and
GenCosts one coefficient each across all modes. Modify your estimation to
Auto Bus Train Air
Trip
3
provide mode-specific InVehTime (i.e., estimate a different coefficient for
InVehTime for every mode). Provide the estimation result in your report† and
briefly assess how using coefficients by mode has improved this estimation
(provide two reasons why estimation (2c) is better than estimation (2b)).
[4 points]
c) Try to further improve the estimation result from task (2b)
• by removing the intercept [-1] or
• by raising a variable to the power of two [I(variable^2)] for selected
independent variables or
• by using a logarithmic transformation [log(variable)] for selected
independent variables or
• by estimating mode-specific coefficients for InVehCosts, InVehTime or
GenCosts.
To ensure that all variables are statistically significant, you may have to drop
some independent variables. In your report, provide the final estimate† that
provides the best model fit that you can find. Make sure that your best model
estimation only includes independent variable that (i) have the expected sign
[+ or –] and (ii) have a 90% significance level or more. This will require some
trial and error. It is ok to include constants that do not reach this significance
level. [6 points]
Task 3 [15 points]
In task 2, you were asked to estimate a multinomial logit
model. Here, we explore a nested logit model instead.
a) Describe the reasons why nested mode choice
models sometimes work better than multinomial logit
models. There is no need to estimate a model. A
written description of the potential benefits of nested
logit models is sufficient. [7 points]
b) Create a nesting structure for the modes conventional
car, autonomous car, tolled road, non-tolled road,
walk, bike, e-bike, e-scooter, bus, tram and commuter rail. Use as many nesting
layers as make sense to you. Draw a nesting diagram (as shown in the diagram
above), label the boxes with modes and provide it in your report. Explain in one
paragraph your chosen nesting structure. There are many different solutions
that are plausible. While your nesting structure will not be evaluated, your
reasoning for your chosen nesting structure will be evaluated. [8 points]
Task 4 [20 points]
Task 1 explored multiple regression and Tasks 2 and 3 discrete choice models. In this
Task 4, we look at the differences between the two.
a) We apply multiple regression and discrete choice models for different problem
sets. Explain when to use which one. [4 points]
b) Could you have solved Task 1 with a discrete choice model? Why? [8 points]
c) Could you have solved Task 2 with a multiple regression? Why? [8 points]
Trip
4
Task 5 [10 points]
To explore travel behavior, both household travel surveys (e.g., MiD in Germany) and
panel surveys (e.g., MOP in Germany) have been conducted.
a) Explain the difference between a household travel survey and a panel survey in
terms of selection of participants and common sample sizes. [4 points]
b) For each of the following questions, select a survey (MiD or MOP) that is likely
to be most useful. Explain your choices in two or three sentences [6 points]
× Explain mode choice behavior for shopping trips of high-income
households with 0 workers and 0 cars.
× Explore if people who travel less on weekdays travel more on weekends.
× Explain how household relocation to the suburbs has affected the
likelihood to buy a car.
I appreciate any feedback you like to give on clarity, length and difficulty of this exam.
Also, it would be helpful if you could give an estimation of number of hours it took you
to complete this exam. Your answer is optional and will not affect your grade. Thanks!
* Please provide your estimation results including at least: Variable names, estimated
coefficients, statistical significance of each variable and R2 of the estimate.
† Please provide your estimation results including at least: Variable names, estimated
coefficients, statistical significance of each variable, as well as log-likelihood and R2 of
the estimate.