4.6 Gini index (svygini)

✔️ direct relationship with the Lorenz curve
✔️ most used inequality measure
✔️ [0,1]-bounded, easy to compare
✔️ allows for zero incomes
❌ hard to interpret without a graphical device
❌ more sensitive to outliers (when compared to `svyzenga`)
❌ https://pubmed.ncbi.nlm.nih.gov/26292521/ and https://www.jstor.org/stable/2142862

The Gini index (or Gini coefficient) is one approach to turn the inequality presented by the Lorenz curve into a single number. In essence, it is twice the area between the equality curve and the real Lorenz curve — that is:

\[ \begin{aligned} G &= 2 \bigg( \int_{0}^{1} pdp - \int_{0}^{1} L(p)dp \bigg) \\ \therefore G &= 1 - 2 \int_{0}^{1} L(p)dp \end{aligned} \]

where \(G=0\) in case of perfect equality and \(G = 1\) in the case of perfect inequality.

The estimator proposed by Osier (2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” Journal of the European Survey Research Association 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.) is defined as:

\[ \widehat{G} = \frac{ 2 \sum_{i \in S} w_i r_i y_i - \sum_{i \in S} w_i y_i }{ \hat{Y} } \]

The linearized formula of \(\widehat{G}\) is used to calculate the SE.


4.6.1 Replication Example

The R vardpoor package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a Gini coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the vardpoor library
library(vardpoor)

# load the laeken library
library(laeken)

# load the synthetic EU statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names(eusilc) <- tolower(names(eusilc))

# add a column with the row number
dati <- data.table::data.table(IDd = 1:nrow(eusilc), eusilc)

# calculate the gini coefficient
# using the R vardpoor library
varpoord_gini_calculation <-
  varpoord(
    # analysis variable
    Y = "eqincome",
    
    # weights variable
    w_final = "rb050",
    
    # row number variable
    ID_level1 = "IDd",
    
    # row number variable
    ID_level2 = "IDd",
    
    # strata variable
    H = "db040",
    
    N_h = NULL ,
    
    # clustering variable
    PSU = "rb030",
    
    # data.table
    dataset = dati,
    
    # gini coefficient function
    type = "lingini",
    
    # get linearized variable
    outp_lin = TRUE
    
  )



# construct a survey.design
# using our recommended setup
des_eusilc <-
  svydesign(
    ids = ~ rb030 ,
    strata = ~ db040 ,
    weights = ~ rb050 ,
    data = eusilc
  )

# immediately run the convey_prep function on it
des_eusilc <- convey_prep(des_eusilc)

# coefficients do match
varpoord_gini_calculation$all_result$value
## [1] 26.49652
coef(svygini( ~ eqincome , des_eusilc)) * 100
## eqincome 
## 26.49652
# linearized variables do match
# varpoord
lin_gini_varpoord <- varpoord_gini_calculation$lin_out$lin_gini
# convey
lin_gini_convey <-
  attr(svygini( ~ eqincome , des_eusilc , linearized = TRUE) , "linearized")

# check equality
all.equal(lin_gini_varpoord , (100 * as.numeric(lin_gini_convey)))
## [1] TRUE
# variances do not match exactly
attr(svygini( ~ eqincome , des_eusilc) , 'var') * 10000
##            eqincome
## eqincome 0.03790739
varpoord_gini_calculation$all_result$var
## [1] 0.03783931
# standard errors do not match exactly
varpoord_gini_calculation$all_result$se
## [1] 0.1945233
SE(svygini( ~ eqincome , des_eusilc)) * 100
##           eqincome
## eqincome 0.1946982

the variance estimator and the linearized variable \(z\) are both defined in Linearization-Based Variance Estimation. The functions convey::svygini and vardpoor::lingini produce the same linearized variable \(z\).

However, the measures of uncertainty do not line up, because library(vardpoor) defaults to an ultimate cluster method that can be replicated with an alternative setup of the survey.design object.

# within each strata, sum up the weights
cluster_sums <-
  aggregate(eusilc$rb050 , list(eusilc$db040) , sum)

# name the within-strata sums of weights the `cluster_sum`
names(cluster_sums) <- c("db040" , "cluster_sum")

# merge this column back onto the data.frame
eusilc <- merge(eusilc , cluster_sums)

# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
  svydesign(
    ids = ~ rb030 ,
    strata = ~ db040 ,
    weights = ~ rb050 ,
    data = eusilc ,
    fpc = ~ cluster_sum
  )

# again, immediately run the convey_prep function on the `survey.design`
des_eusilc_ultimate_cluster <-
  convey_prep(des_eusilc_ultimate_cluster)

# matches
attr(svygini( ~ eqincome , des_eusilc_ultimate_cluster) , 'var') * 10000
##            eqincome
## eqincome 0.03783931
varpoord_gini_calculation$all_result$var
## [1] 0.03783931
# matches
varpoord_gini_calculation$all_result$se
## [1] 0.1945233
SE(svygini( ~ eqincome , des_eusilc_ultimate_cluster)) * 100
##           eqincome
## eqincome 0.1945233

For additional usage examples of svygini, type ?convey::svygini in the R console.

4.6.2 Real World Examples

This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svygini function.

To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.

4.6.2.1 CPS-ASEC Household Income

svygini(~ htotval , cps_household_design)
##            gini    SE
## htotval 0.48846 0.002
svyby(~ htotval , ~ sex , cps_household_design , svygini)
##           sex   htotval  se.htotval
## male     male 0.4723461 0.003034223
## female female 0.5008580 0.002499741

4.6.2.2 CPS-ASEC Family Income

svygini(~ ftotval , cps_family_design)
##            gini     SE
## ftotval 0.45816 0.0023
svyby(~ ftotval , ~ sex , cps_family_design , svygini)
##           sex   ftotval  se.ftotval
## male     male 0.4400040 0.003677892
## female female 0.4745831 0.002801257

4.6.2.3 CPS-ASEC Worker Earnings

svygini(~ pearnval , cps_ftfy_worker_design)
##           gini     SE
## pearnval 0.412 0.0026
svyby(~ pearnval , ~ sex , cps_ftfy_worker_design , svygini)
##           sex  pearnval se.pearnval
## male     male 0.4185334 0.003179673
## female female 0.3914812 0.004673079

4.6.2.4 PNAD Contínua Per Capita Income

svygini( ~ deflated_per_capita_income , pnadc_design , na.rm = TRUE)
##                               gini     SE
## deflated_per_capita_income 0.51845 0.0032
svyby(~ deflated_per_capita_income ,
      ~ sex ,
      pnadc_design ,
      svygini ,
      na.rm = TRUE)
##           sex deflated_per_capita_income se.deflated_per_capita_income
## male     male                  0.5202218                   0.003394589
## female female                  0.5165412                   0.003210982

4.6.2.5 PNAD Contínua Worker Earnings

svygini( ~ deflated_labor_income , pnadc_design , na.rm = TRUE)
##                          gini     SE
## deflated_labor_income 0.48606 0.0036
svyby( ~ deflated_labor_income , ~ sex , pnadc_design , svygini , na.rm = TRUE)
##           sex deflated_labor_income se.deflated_labor_income
## male     male             0.4906702              0.004082281
## female female             0.4711193              0.003826630

4.6.2.6 SCF Family Net Worth

scf_MIcombine(with(scf_design , svygini( ~ networth)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svygini(~networth)))
##            results          se
## networth 0.8299712 0.003921153
scf_MIcombine(with(scf_design , svyby( ~ networth, ~ hhsex , svygini)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyby(~networth, ~hhsex, svygini)))
##          results          se
## male   0.8160695 0.005037685
## female 0.8258288 0.012458833

4.6.2.7 SCF Family Income

scf_MIcombine(with(scf_design , svygini( ~ income)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svygini(~income)))
##          results         se
## income 0.6070385 0.01059348
scf_MIcombine(with(scf_design , svyby( ~ income, ~ hhsex , svygini)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyby(~income, ~hhsex, svygini)))
##          results         se
## male   0.5987009 0.01197205
## female 0.4633805 0.01269733