4.11 Generalized Entropy and Decomposition (svygei, svygeidec)

✔️ flexible inequality-aversion parameter -- varying its epsilon parameter can highlight the effect of inequality in different parts of the income distribution
✔️ can be group-decomposed into within-inequality and between-inequality
✔️ this parameter can also be (somewhat) tuned to be less affected by outliers
❌ does not handle zero or negative incomes
❌ hard to interpret
❌ can be very sensitive to outliers

Using a generalization of the information function, now defined as:

\[ g(f) = \frac{1}{\alpha-1} [ 1 - f^{\alpha - 1} ] \]

the \(\alpha\)-class entropy is:

\[ H^{(\alpha)} (f) = \frac{1}{\alpha - 1} \bigg[ 1 - \int_{-\infty}^{\infty} f(y)^{ \alpha - 1} f(y) dy \bigg] \text{.} \]

This relates to a class of inequality measures, the Generalized entropy indices, defined as:

\[ GE^{(\alpha)} = \frac{1}{\alpha^2 - \alpha} \int_{0}^\infty \bigg[ \bigg( \frac{y}{\mu} \bigg)^\alpha - 1 \bigg]dF(x) = - \frac{-H_\alpha(s) }{ \alpha } \text{.} \]

The parameter \(\alpha\) also has an economic interpretation: as \(\alpha\) increases, the influence of high incomes upon the index increases. In some cases, this measure takes special forms, such as the mean log deviation and the aforementioned Theil-T index.

Biewen and Jenkins (2003Biewen, Martin, and Stephen Jenkins. 2003. “Estimation of Generalized Entropy and Atkinson Inequality Indices from Complex Survey Data.” Discussion Papers of DIW Berlin 345. DIW Berlin, German Institute for Economic Research. http://EconPapers.repec.org/RePEc:diw:diwwpp:dp345.) use the following finite-population as the basis for a plugin estimator:

\[ GE^{(\alpha)} = \begin{cases} ( \alpha^2 - \alpha)^{-1} \big[ U_0^{\alpha - 1} U_1^{-\alpha} U_\alpha -1 \big], & \text{if } \alpha \in \mathbb{R} \setminus \{0,1\} \\ - T_0 U_0^{-1} + \log ( U_1 / U_0 ), &\text{if } \alpha \rightarrow 0 \\ T_1 U_1^{-1} - \log ( U_1 / U_0 ), & \text{if } \alpha \rightarrow 1 \end{cases} \]

where \(U_\gamma = \sum_{i \in U} y_i^\gamma\) and \(T_\gamma = \sum_{i \in U} y_i^\gamma \log y_i\). Since those are all functions of totals, the linearization of the indices are easily achieved using the theorems described in Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.).

This class of inequality measure also has several desirable properties, such as additive decomposition. Additive decomposition allows researchers to compare the effects of inequality within and between population groups on the population’s level of inequality. Put simply, taking \(G\) groups, an additive decomposable index allows for:

\[ \begin{aligned} I ( \mathbf{y} ) &= I_{Within} + I_{Between} \\ \end{aligned} \]

where \(I_{Within} = \sum_{g \in G} W_g I( \mathbf{y}_g )\), with \(W_g\) being measure-specific group weights; and \(I_{Between}\) is a function of the group means and population sizes.


4.11.1 Replication Example

In July 2006, Jenkins (2008Jenkins, Stephen. 2008. “Estimation and Interpretation of Measures of Inequality, Poverty, and Social Welfare Using Stata.” North American Stata Users' Group Meetings 2006. Stata Users Group. http://EconPapers.repec.org/RePEc:boc:asug06:16.) presented at the North American Stata Users’ Group Meetings on the stata Generalized Entropy Index command. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the foreign library
library(foreign)

# create a temporary file on the local disk
tf <- tempfile()

# store the location of the presentation file
presentation_zip <-
  "https://web.archive.org/web/20150928053959/http://repec.org/nasug2006/nasug2006_jenkins.zip"

# download jenkins' presentation to the temporary file
download.file(presentation_zip , tf , mode = 'wb')

# unzip the contents of the archive
presentation_files <- unzip(tf , exdir = tempdir())

# load the institute for fiscal studies' 1981, 1985, and 1991 data.frame objects
x81 <-
  read.dta(grep("ifs81" , presentation_files , value = TRUE))
x85 <-
  read.dta(grep("ifs85" , presentation_files , value = TRUE))
x91 <-
  read.dta(grep("ifs91" , presentation_files , value = TRUE))

# stack each of these three years of data into a single data.frame
x <- rbind(x81 , x85 , x91)

Replicate the author’s survey design statement from stata code..

. * account for clustering within HHs 
. version 8: svyset [pweight = wgt], psu(hrn)
pweight is wgt
psu is hrn
construct an

.. into R code:

# initiate a linearized survey design object
y <- svydesign( ~ hrn , data = x , weights = ~ wgt)

# immediately run the `convey_prep` function on the survey design
z <- convey_prep(y)

Replicate the author’s subset statement and each of his svygei results..

. svygei x if year == 1981
 
Warning: x has 20 values = 0. Not used in calculations

Complex survey estimates of Generalized Entropy inequality indices
 
pweight: wgt                                   Number of obs    = 9752
Strata: <one>                                  Number of strata = 1
PSU: hrn                                       Number of PSUs   = 7459
                                               Population size  = 54766261
---------------------------------------------------------------------------
Index    |  Estimate   Std. Err.      z      P>|z|     [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1)   |  .1902062   .02474921     7.69    0.000      .1416987   .2387138
MLD      |  .1142851   .00275138    41.54    0.000      .1088925   .1196777
Theil    |  .1116923   .00226489    49.31    0.000      .1072532   .1161314
GE(2)    |   .128793   .00330774    38.94    0.000      .1223099    .135276
GE(3)    |  .1739994   .00662015    26.28    0.000      .1610242   .1869747
---------------------------------------------------------------------------

..using R code:

z81 <- subset(z , year == 1981)

svygei( ~ eybhc0 , subset(z81 , eybhc0 > 0) , epsilon = -1)
##            gei     SE
## eybhc0 0.19021 0.0247
svygei( ~ eybhc0 , subset(z81 , eybhc0 > 0) , epsilon = 0)
##            gei     SE
## eybhc0 0.11429 0.0028
svygei( ~ eybhc0 , subset(z81 , eybhc0 > 0))
##            gei     SE
## eybhc0 0.11169 0.0023
svygei( ~ eybhc0 , subset(z81 , eybhc0 > 0) , epsilon = 2)
##            gei     SE
## eybhc0 0.12879 0.0033
svygei( ~ eybhc0 , subset(z81 , eybhc0 > 0) , epsilon = 3)
##          gei     SE
## eybhc0 0.174 0.0066

Confirm this replication applies for subsetted objects as well. Compare stata output..

. svygei x if year == 1985 & x >= 1

Complex survey estimates of Generalized Entropy inequality indices
 
pweight: wgt                                   Number of obs    = 8969
Strata: <one>                                  Number of strata = 1
PSU: hrn                                       Number of PSUs   = 6950
                                               Population size  = 55042871
---------------------------------------------------------------------------
Index    |  Estimate   Std. Err.      z      P>|z|     [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1)   |  .1602358   .00936931    17.10    0.000      .1418723   .1785993
MLD      |   .127616   .00332187    38.42    0.000      .1211052   .1341267
Theil    |  .1337177   .00406302    32.91    0.000      .1257543    .141681
GE(2)    |  .1676393   .00730057    22.96    0.000      .1533304   .1819481
GE(3)    |  .2609507   .01850689    14.10    0.000      .2246779   .2972235
---------------------------------------------------------------------------

..to R code:

z85 <- subset(z , year == 1985)

svygei( ~ eybhc0 , subset(z85 , eybhc0 > 1) , epsilon = -1)
##            gei     SE
## eybhc0 0.16024 0.0094
svygei( ~ eybhc0 , subset(z85 , eybhc0 > 1) , epsilon = 0)
##            gei     SE
## eybhc0 0.12762 0.0033
svygei( ~ eybhc0 , subset(z85 , eybhc0 > 1))
##            gei     SE
## eybhc0 0.13372 0.0041
svygei( ~ eybhc0 , subset(z85 , eybhc0 > 1) , epsilon = 2)
##            gei     SE
## eybhc0 0.16764 0.0073
svygei( ~ eybhc0 , subset(z85 , eybhc0 > 1) , epsilon = 3)
##            gei     SE
## eybhc0 0.26095 0.0185

Replicate the author’s decomposition by population subgroup (work status) shown on PDF page 57..

# define work status (PDF page 22)
z <-
  update(z , wkstatus = c(1 , 1 , 1 , 1 , 2 , 3 , 2 , 2)[as.numeric(esbu)])
z <-
  update(z , wkstatus = factor(wkstatus , labels = c("1+ ft working" , "no ft working" , "elderly")))

# subset to 1991 and remove records with zero income
z91 <- subset(z , year == 1991 & eybhc0 > 0)

# population share
svymean( ~ wkstatus, z91)
##                          mean     SE
## wkstatus1+ ft working 0.61724 0.0067
## wkstatusno ft working 0.20607 0.0059
## wkstatuselderly       0.17669 0.0046
# mean
svyby( ~ eybhc0, ~ wkstatus, z91, svymean)
##                    wkstatus   eybhc0       se
## 1+ ft working 1+ ft working 278.8040 3.703790
## no ft working no ft working 151.6317 3.153968
## elderly             elderly 176.6045 4.661740
# subgroup indices: ge_k
svyby( ~ eybhc0 , ~ wkstatus , z91 , svygei , epsilon = -1)
##                    wkstatus     eybhc0          se
## 1+ ft working 1+ ft working  0.2300708  0.02853959
## no ft working no ft working 10.9231761 10.65482557
## elderly             elderly  0.1932164  0.02571991
svyby( ~ eybhc0 , ~ wkstatus , z91 , svygei , epsilon = 0)
##                    wkstatus    eybhc0          se
## 1+ ft working 1+ ft working 0.1536921 0.006955506
## no ft working no ft working 0.1836835 0.014740510
## elderly             elderly 0.1653658 0.016409770
svyby( ~ eybhc0 , ~ wkstatus , z91 , svygei , epsilon = 1)
##                    wkstatus    eybhc0          se
## 1+ ft working 1+ ft working 0.1598558 0.008327994
## no ft working no ft working 0.1889909 0.016766120
## elderly             elderly 0.2023862 0.027787224
svyby( ~ eybhc0 , ~ wkstatus , z91 , svygei , epsilon = 2)
##                    wkstatus    eybhc0         se
## 1+ ft working 1+ ft working 0.2130664 0.01546521
## no ft working no ft working 0.2846345 0.06016394
## elderly             elderly 0.3465088 0.07362898
# GE decomposition
svygeidec( ~ eybhc0, ~ wkstatus, z91, epsilon = -1)
##         gei decomposition     SE
## total            3.682893 3.3999
## within           3.646572 3.3998
## between          0.036321 0.0028
svygeidec( ~ eybhc0, ~ wkstatus, z91, epsilon = 0)
##         gei decomposition     SE
## total            0.195236 0.0065
## within           0.161935 0.0061
## between          0.033301 0.0025
svygeidec( ~ eybhc0, ~ wkstatus, z91, epsilon = 1)
##         gei decomposition     SE
## total            0.200390 0.0079
## within           0.169396 0.0076
## between          0.030994 0.0022
svygeidec( ~ eybhc0, ~ wkstatus, z91, epsilon = 2)
##         gei decomposition     SE
## total            0.274325 0.0167
## within           0.245067 0.0164
## between          0.029258 0.0021

For additional usage examples of svygei or svygeidec, type ?convey::svygei or ?convey::svygeidec in the R console.

4.11.2 Real World Examples

This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svygei function.

To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.

4.11.2.1 CPS-ASEC Household Income

svygei(
  ~ htotval ,
  subset(cps_household_design , htotval > 0)
)
##            gei     SE
## htotval 0.4252 0.0052
svyby(
  ~ htotval ,
  ~ sex ,
  subset(cps_household_design , htotval > 0) ,
  svygei
)
##           sex   htotval  se.htotval
## male     male 0.3972009 0.007639779
## female female 0.4491281 0.006953494

4.11.2.2 CPS-ASEC Family Income

svygei(
  ~ ftotval ,
  subset(cps_family_design , ftotval > 0)
)
##             gei     SE
## ftotval 0.37484 0.0055
svyby(
  ~ ftotval ,
  ~ sex ,
  subset(cps_family_design , ftotval > 0) ,
  svygei
)
##           sex   ftotval  se.ftotval
## male     male 0.3494651 0.008638860
## female female 0.3990923 0.006595221

4.11.2.3 CPS-ASEC Worker Earnings

svygei(
  ~ pearnval ,
  subset(cps_ftfy_worker_design , pearnval > 0)
)
##              gei     SE
## pearnval 0.34162 0.0062
svyby(
  ~ pearnval ,
  ~ sex ,
  subset(cps_ftfy_worker_design , pearnval > 0) ,
  svygei
)
##           sex  pearnval se.pearnval
## male     male 0.3456834 0.007121393
## female female 0.3178379 0.011904099

4.11.2.4 PNAD Contínua Per Capita Income

svygei(
  ~ deflated_per_capita_income ,
  subset(pnadc_design , deflated_per_capita_income > 0),
  na.rm = TRUE
)
##                                gei     SE
## deflated_per_capita_income 0.52363 0.0107
svyby(
  ~ deflated_per_capita_income ,
  ~ sex ,
  subset(pnadc_design , deflated_per_capita_income > 0),
  svygei ,
  na.rm = TRUE
)
##           sex deflated_per_capita_income se.deflated_per_capita_income
## male     male                  0.5304924                    0.01124340
## female female                  0.5163178                    0.01081883

4.11.2.5 PNAD Contínua Worker Earnings

svygei(
  ~ deflated_labor_income ,
  subset(pnadc_design , deflated_labor_income > 0) ,
  na.rm = TRUE
)
##                           gei     SE
## deflated_labor_income 0.49544 0.0119
svyby(
  ~ deflated_labor_income ,
  ~ sex ,
  subset(pnadc_design , deflated_labor_income > 0) ,
  svygei ,
  na.rm = TRUE
)
##           sex deflated_labor_income se.deflated_labor_income
## male     male             0.5106575               0.01436090
## female female             0.4510399               0.01024883

4.11.2.6 SCF Family Net Worth

scf_MIcombine(with(subset(scf_design , networth > 0) , svygei(~ networth)))
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(subset(scf_design, networth > 0), svygei(~networth)))
##           results         se
## networth 1.834597 0.05022745
scf_MIcombine(with(
  subset(scf_design , networth > 0) ,
  svyby(~ networth, ~ hhsex , svygei)
))
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(subset(scf_design, networth > 0), svyby(~networth, 
##     ~hhsex, svygei)))
##         results         se
## male   1.770600 0.05227645
## female 1.563828 0.20502736

4.11.2.7 SCF Family Income

scf_MIcombine(with(subset(scf_design , income > 0) , svygei(~ income)))
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(subset(scf_design, income > 0), svygei(~income)))
##          results         se
## income 0.9948022 0.07542657
scf_MIcombine(with(
  subset(scf_design , income > 0) ,
  svyby(~ income, ~ hhsex , svygei)
))
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(subset(scf_design, income > 0), svyby(~income, 
##     ~hhsex, svygei)))
##          results         se
## male   0.9715715 0.07988096
## female 0.4653207 0.05566310