Database讲解、Python语言讲解、Java/C++编程设计调试
- 首页 >> Java编程 Homework 5: Pareto and Kuznets on the Grand Tour
We continue working with the World Top Incomes Database [https://wid.world], and the Pareto distribution,
as in the lab. We also continue to practice working with data frames, manipulating data from one format to
another, and writing functions to automate repetitive tasks.
We saw in the lab that if the upper tail of the income distribution followed a perfect Pareto distribution, (3)
We could estimate the Pareto exponent by solving any one of these equations for a; in lab we used
a = 1 −
log 10
log (P99/P99.9) , (4)
Because of measurement error and sampling noise, we can’t find find one value of a which will work for
all three equations (1)–(3). Generally, trying to make all three equations come close to balancing gives a
better estimate of a than just solving one of them. (This is analogous to finding the slope and intercept of a
regression line by trying to come close to all the points in a scatterplot, and not just running a line through
two of them.)
1. We estimate a by minimizing
Write a function, percentile_ratio_discrepancies, which takes as inputs P99, P99.5, P99.9 and a,
and returns the value of the expression above. Check that when P99=1e6, P99.5=2e6, P99.9=1e7 and
a=2, your function returns 0.
2. Write a function, exponent.multi_ratios_est, which takes as inputs P99, P99.5, P99.9, and estimates
a. It should minimize your percentile_ratio_discrepancies function. The starting value for the
minimization should come from (4). Check that when P99=1e6, P99.5=2e6 and P99.9=1e7, your
function returns an a of 2.
3. Write a function which uses exponent.multi_ratios_est to estimate a for the US for every year from
1913 to 2012. (There are many ways you could do thi, including loops.) Plot the estimates; make sure
the labels of the plot are appropriate.
4. Use (4) to estimate a for the US for every year. Make a scatter-plot of these estimates against those
from problem 3. If they are identical or completely independent, something is wrong with at least one
part of your code. Otherwise, can you say anything about how the two estimates compare?
1
We continue working with the World Top Incomes Database [https://wid.world], and the Pareto distribution,
as in the lab. We also continue to practice working with data frames, manipulating data from one format to
another, and writing functions to automate repetitive tasks.
We saw in the lab that if the upper tail of the income distribution followed a perfect Pareto distribution, (3)
We could estimate the Pareto exponent by solving any one of these equations for a; in lab we used
a = 1 −
log 10
log (P99/P99.9) , (4)
Because of measurement error and sampling noise, we can’t find find one value of a which will work for
all three equations (1)–(3). Generally, trying to make all three equations come close to balancing gives a
better estimate of a than just solving one of them. (This is analogous to finding the slope and intercept of a
regression line by trying to come close to all the points in a scatterplot, and not just running a line through
two of them.)
1. We estimate a by minimizing
Write a function, percentile_ratio_discrepancies, which takes as inputs P99, P99.5, P99.9 and a,
and returns the value of the expression above. Check that when P99=1e6, P99.5=2e6, P99.9=1e7 and
a=2, your function returns 0.
2. Write a function, exponent.multi_ratios_est, which takes as inputs P99, P99.5, P99.9, and estimates
a. It should minimize your percentile_ratio_discrepancies function. The starting value for the
minimization should come from (4). Check that when P99=1e6, P99.5=2e6 and P99.9=1e7, your
function returns an a of 2.
3. Write a function which uses exponent.multi_ratios_est to estimate a for the US for every year from
1913 to 2012. (There are many ways you could do thi, including loops.) Plot the estimates; make sure
the labels of the plot are appropriate.
4. Use (4) to estimate a for the US for every year. Make a scatter-plot of these estimates against those
from problem 3. If they are identical or completely independent, something is wrong with at least one
part of your code. Otherwise, can you say anything about how the two estimates compare?
1