3.8 Generalized Entropy and Decomposition (svygei, svygeidec)

Using a generalization of the information function, now defined as \(g(f) = \frac{1}{\alpha-1} [ 1 - f^{\alpha - 1} ]\), the \(\alpha\)-class entropy is \[ H_\alpha(f) = \frac{1}{\alpha - 1} \bigg[ 1 - \int_{-\infty}^{\infty} f(y)^{ \alpha - 1} f(y) dy \bigg] \text{.} \]

This relates to a class of inequality measures, the Generalized entropy indices, defined as:

\[ GE_\alpha = \frac{1}{\alpha^2 - \alpha} \int_{0}^\infty \bigg[ \bigg( \frac{y}{\mu} \bigg)^\alpha - 1 \bigg]dF(x) = - \frac{-H_\alpha(s) }{ \alpha } \text{.} \]

The parameter \(\alpha\) also has an economic interpretation: as \(\alpha\) increases, the influence of top incomes upon the index increases. In some cases, this measure takes special forms, such as mean log deviation and the aforementioned Theil index.

In order to estimate it, Biewen and Jenkins (2003Biewen, Martin, and Stephen Jenkins. 2003. “Estimation of Generalized Entropy and Atkinson Inequality Indices from Complex Survey Data.” Discussion Papers of DIW Berlin 345. DIW Berlin, German Institute for Economic Research. http://EconPapers.repec.org/RePEc:diw:diwwpp:dp345.) proposed the following:

\[ GE_\alpha = \begin{cases} ( \alpha^2 - \alpha)^{-1} \big[ U_0^{\alpha - 1} U_1^{-\alpha} U_\alpha -1 \big], & \text{if } \alpha \in \mathbb{R} \setminus \{0,1\} \\ - T_0 U_0^{-1} + \log ( U_1 / U_0 ), &\text{if } \alpha \rightarrow 0 \\ T_1 U_1^{-1} - \log ( U_1 / U_0 ), & \text{if } \alpha \rightarrow 1 \end{cases} \]

where \(U_\gamma = \sum_{i \in S} w_i \cdot y_i^\gamma\) and \(T_\gamma = \sum_{i \in S} w_i \cdot y_i^\gamma \cdot \log y_i\). Since those are all functions of totals, the linearization of the indices are easily achieved using the theorems described in Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.).

This class also has several desirable properties, such as additive decomposition. The additive decomposition allows to compare the effects of inequality within and between population groups on the population inequality. Put simply, an additive decomposable index allows for:

\[ I_{Total} = I_{Between} + I_{Within} \text{.} \]


A replication example

In July 2006, Jenkins (2008Jenkins, Stephen. 2008. “Estimation and Interpretation of Measures of Inequality, Poverty, and Social Welfare Using Stata.” North American Stata Users’ Group Meetings 2006. Stata Users Group. http://EconPapers.repec.org/RePEc:boc:asug06:16.) presented at the North American Stata Users’ Group Meetings on the stata Generalized Entropy Index command. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the foreign library
library(foreign)

# create a temporary file on the local disk
tf <- tempfile()

# store the location of the presentation file
presentation_zip <- "http://repec.org/nasug2006/nasug2006_jenkins.zip"

# download jenkins' presentation to the temporary file
download.file( presentation_zip , tf , mode = 'wb' )

# unzip the contents of the archive
presentation_files <- unzip( tf , exdir = tempdir() )

# load the institute for fiscal studies' 1981, 1985, and 1991 data.frame objects
x81 <- read.dta( grep( "ifs81" , presentation_files , value = TRUE ) )
x85 <- read.dta( grep( "ifs85" , presentation_files , value = TRUE ) )
x91 <- read.dta( grep( "ifs91" , presentation_files , value = TRUE ) )

# stack each of these three years of data into a single data.frame
x <- rbind( x81 , x85 , x91 )

Replicate the author’s survey design statement from stata code..

. * account for clustering within HHs 
. version 8: svyset [pweight = wgt], psu(hrn)
pweight is wgt
psu is hrn
construct an

.. into R code:

# initiate a linearized survey design object
y <- svydesign( ~ hrn , data = x , weights = ~ wgt )

# immediately run the `convey_prep` function on the survey design
z <- convey_prep( y )

Replicate the author’s subset statement and each of his svygei results..

. svygei x if year == 1981
 
Warning: x has 20 values = 0. Not used in calculations

Complex survey estimates of Generalized Entropy inequality indices
 
pweight: wgt                                   Number of obs    = 9752
Strata: <one>                                  Number of strata = 1
PSU: hrn                                       Number of PSUs   = 7459
                                               Population size  = 54766261
---------------------------------------------------------------------------
Index    |  Estimate   Std. Err.      z      P>|z|     [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1)   |  .1902062   .02474921     7.69    0.000      .1416987   .2387138
MLD      |  .1142851   .00275138    41.54    0.000      .1088925   .1196777
Theil    |  .1116923   .00226489    49.31    0.000      .1072532   .1161314
GE(2)    |   .128793   .00330774    38.94    0.000      .1223099    .135276
GE(3)    |  .1739994   .00662015    26.28    0.000      .1610242   .1869747
---------------------------------------------------------------------------

..using R code:

z81 <- subset( z , year == 1981 )

svygei( ~ eybhc0 , subset( z81 , eybhc0 > 0 ) , epsilon = -1 )
##            gei     SE
## eybhc0 0.19021 0.0247
svygei( ~ eybhc0 , subset( z81 , eybhc0 > 0 ) , epsilon = 0 )
##            gei     SE
## eybhc0 0.11429 0.0028
svygei( ~ eybhc0 , subset( z81 , eybhc0 > 0 ) )
##            gei     SE
## eybhc0 0.11169 0.0023
svygei( ~ eybhc0 , subset( z81 , eybhc0 > 0 ) , epsilon = 2 )
##            gei     SE
## eybhc0 0.12879 0.0033
svygei( ~ eybhc0 , subset( z81 , eybhc0 > 0 ) , epsilon = 3 )
##          gei     SE
## eybhc0 0.174 0.0066

Confirm this replication applies for subsetted objects as well. Compare stata output..

. svygei x if year == 1985 & x >= 1

Complex survey estimates of Generalized Entropy inequality indices
 
pweight: wgt                                   Number of obs    = 8969
Strata: <one>                                  Number of strata = 1
PSU: hrn                                       Number of PSUs   = 6950
                                               Population size  = 55042871
---------------------------------------------------------------------------
Index    |  Estimate   Std. Err.      z      P>|z|     [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1)   |  .1602358   .00936931    17.10    0.000      .1418723   .1785993
MLD      |   .127616   .00332187    38.42    0.000      .1211052   .1341267
Theil    |  .1337177   .00406302    32.91    0.000      .1257543    .141681
GE(2)    |  .1676393   .00730057    22.96    0.000      .1533304   .1819481
GE(3)    |  .2609507   .01850689    14.10    0.000      .2246779   .2972235
---------------------------------------------------------------------------

..to R code:

z85 <- subset( z , year == 1985 )

svygei( ~ eybhc0 , subset( z85 , eybhc0 > 1 ) , epsilon = -1 )
##            gei     SE
## eybhc0 0.16024 0.0094
svygei( ~ eybhc0 , subset( z85 , eybhc0 > 1 ) , epsilon = 0 )
##            gei     SE
## eybhc0 0.12762 0.0033
svygei( ~ eybhc0 , subset( z85 , eybhc0 > 1 ) )
##            gei     SE
## eybhc0 0.13372 0.0041
svygei( ~ eybhc0 , subset( z85 , eybhc0 > 1 ) , epsilon = 2 )
##            gei     SE
## eybhc0 0.16764 0.0073
svygei( ~ eybhc0 , subset( z85 , eybhc0 > 1 ) , epsilon = 3 )
##            gei     SE
## eybhc0 0.26095 0.0185

Replicate the author’s decomposition by population subgroup (work status) shown on PDF page 57..

# define work status (PDF page 22)
z <- update( z , wkstatus = c( 1 , 1 , 1 , 1 , 2 , 3 , 2 , 2 )[ as.numeric( esbu ) ] )
z <- update( z , factor( wkstatus , labels = c( "1+ ft working" , "no ft working" , "elderly" ) ) )

# subset to 1991 and remove records with zero income
z91 <- subset( z , year == 1991 & eybhc0 > 0 )

# population share
svymean( ~wkstatus, z91 )
##            mean     SE
## wkstatus 1.5594 0.0099
# mean
svyby( ~eybhc0, ~wkstatus, z91, svymean )
##   wkstatus   eybhc0       se
## 1        1 278.8040 3.703790
## 2        2 151.6317 3.153968
## 3        3 176.6045 4.661740
# subgroup indices: ge_k
svyby( ~ eybhc0 , ~wkstatus , z91 , svygei , epsilon = -1 )
##   wkstatus     eybhc0          se
## 1        1  0.2300708  0.02853959
## 2        2 10.9231761 10.65482557
## 3        3  0.1932164  0.02571991
svyby( ~ eybhc0 , ~wkstatus , z91 , svygei , epsilon = 0 )
##   wkstatus    eybhc0          se
## 1        1 0.1536921 0.006955506
## 2        2 0.1836835 0.014740510
## 3        3 0.1653658 0.016409770
svyby( ~ eybhc0 , ~wkstatus , z91 , svygei , epsilon = 1 )
##   wkstatus    eybhc0          se
## 1        1 0.1598558 0.008327994
## 2        2 0.1889909 0.016766120
## 3        3 0.2023862 0.027787224
svyby( ~ eybhc0 , ~wkstatus , z91 , svygei , epsilon = 2 )
##   wkstatus    eybhc0         se
## 1        1 0.2130664 0.01546521
## 2        2 0.2846345 0.06016394
## 3        3 0.3465088 0.07362898
# GE decomposition
svygeidec( ~eybhc0, ~wkstatus, z91, epsilon = -1 )
##       total within between
## coef 3.6829 3.6466  0.0363
## SE   3.3999 3.3993  0.0541
svygeidec( ~eybhc0, ~wkstatus, z91, epsilon = 0 )
##          total    within between
## coef 0.1952363 0.1619352  0.0333
## SE   0.0064615 0.0062209  0.0027
svygeidec( ~eybhc0, ~wkstatus, z91, epsilon = 1 )
##          total    within between
## coef 0.2003897 0.1693958  0.0310
## SE   0.0079299 0.0044132  0.0073
svygeidec( ~eybhc0, ~wkstatus, z91, epsilon = 2 )
##         total   within between
## coef 0.274325 0.245067  0.0293
## SE   0.016694 0.017831  0.0038

For additional usage examples of svygei or svygeidec, type ?convey::svygei or ?convey::svygeidec in the R console.