3.3 Lorenz Curve (svylorenz)

Though not an inequality measure in itself, the Lorenz curve is a classic instrument of distribution analysis. Basically, it is a function that associates a cumulative share of the population to the share of the total income it owns. In mathematical terms,

\[ L(p) = \frac{\int_{-\infty}^{Q_p}yf(y)dy}{\int_{-\infty}^{+\infty}yf(y)dy} \]

where \(Q_p\) is the quantile \(p\) of the population.

The two extreme distributive cases are

  • Perfect equality:
    • Every individual has the same income;
    • Every share of the population has the same share of the income;
    • Therefore, the reference curve is \[L(p) = p \text{ } \forall p \in [0,1] \text{.}\]
  • Perfect inequality:
    • One individual concentrates all of society’s income, while the other individuals have zero income;
    • Therefore, the reference curve is

\[ L(p)= \begin{cases} 0, &\forall p < 1 \\ 1, &\text{if } p = 1 \text{.} \end{cases} \]

In order to evaluate the degree of inequality in a society, the analyst looks at the distance between the real curve and those two reference curves.

The estimator of this function was derived by Kovacevic and Binder (1997Kovacevic, Milorad, and David Binder. 1997. “Variance Estimation for Measures of Income Inequality and Polarization - the Estimating Equations Approach.” Journal of Official Statistics 13 (1): 41–58. http://www.jos.nu/Articles/abstract.asp?article=13141.):

\[ L(p) = \frac{ \sum_{i \in S} w_i \cdot y_i \cdot \delta \{ y_i \le \widehat{Q}_p \}}{\widehat{Y}}, \text{ } 0 \le p \le 1. \]

Yet, this formula is used to calculate specific points of the curve and their respective SEs. The formula to plot an approximation of the continuous empirical curve comes from Lerman and Yitzhaki (1989Lerman, Robert, and Shlomo Yitzhaki. 1989. “Improving the Accuracy of Estimates of Gini Coefficients.” Journal of Econometrics 42 (1): 43–47. http://EconPapers.repec.org/RePEc:eee:econom:v:42:y:1989:i:1:p:43-47.).


A replication example

In October 2016, (Jann 2016Jann, Ben. 2016. “Estimating Lorenz and concentration curves in Stata.” University of Bern Social Sciences Working Papers 15. University of Bern, Department of Social Sciences. https://ideas.repec.org/p/bss/wpaper/15.html.) released a pre-publication working paper to estimate lorenz and concentration curves using stata. The example below reproduces the statistics presented in his section 4.1.

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the stata-style webuse library
library(webuse)

# load the NLSW 1988 data
webuse("nlsw88")

# coerce that `tbl_df` to a standard R `data.frame`
nlsw88 <- data.frame( nlsw88 )

# initiate a linearized survey design object
des_nlsw88 <- svydesign( ids = ~1 , data = nlsw88 )
## Warning in svydesign.default(ids = ~1, data = nlsw88): No weights or
## probabilities supplied, assuming equal probability
# immediately run the `convey_prep` function on the survey design
des_nlsw88 <- convey_prep(des_nlsw88)

# estimates lorenz curve
result.lin <- svylorenz( ~wage, des_nlsw88, quantiles = seq( 0, 1, .05 ), na.rm = TRUE )

# note: most survey commands in R use Inf degrees of freedom by default
# stata generally uses the degrees of freedom of the survey design.
# therefore, while this extended syntax serves to prove a precise replication of stata
# it is generally not necessary.
section_four_one <-
    data.frame( 
        estimate = coef( result.lin ) , 
        standard_error = SE( result.lin ) , 
        ci_lower_bound = 
            coef( result.lin ) + 
            SE( result.lin ) * 
            qt( 0.025 , degf( subset( des_nlsw88 , !is.na( wage ) ) ) ) ,
        ci_upper_bound = 
            coef( result.lin ) + 
            SE( result.lin ) * 
            qt( 0.975 , degf( subset( des_nlsw88 , !is.na( wage ) ) ) )
    )
estimate standard_error ci_lower_bound ci_upper_bound
0 0.0000000 0.0000000 0.0000000 0.0000000
0.05 0.0151060 0.0004159 0.0142904 0.0159216
0.1 0.0342651 0.0007021 0.0328882 0.0356420
0.15 0.0558635 0.0010096 0.0538836 0.0578434
0.2 0.0801846 0.0014032 0.0774329 0.0829363
0.25 0.1067687 0.0017315 0.1033732 0.1101642
0.3 0.1356307 0.0021301 0.1314535 0.1398078
0.35 0.1670287 0.0025182 0.1620903 0.1719670
0.4 0.2005501 0.0029161 0.1948315 0.2062687
0.45 0.2369209 0.0033267 0.2303971 0.2434447
0.5 0.2759734 0.0037423 0.2686347 0.2833121
0.55 0.3180215 0.0041626 0.3098585 0.3261844
0.6 0.3633071 0.0045833 0.3543192 0.3722950
0.65 0.4125183 0.0050056 0.4027021 0.4223345
0.7 0.4657641 0.0054137 0.4551478 0.4763804
0.75 0.5241784 0.0058003 0.5128039 0.5355529
0.8 0.5880894 0.0062464 0.5758401 0.6003388
0.85 0.6577051 0.0066148 0.6447333 0.6706769
0.9 0.7346412 0.0068289 0.7212497 0.7480328
0.95 0.8265786 0.0062686 0.8142857 0.8388715
1 1.0000000 0.0000000 1.0000000 1.0000000

For additional usage examples of svylorenz, type ?convey::svylorenz in the R console.