## 3.4 Gini index (svygini)

The Gini index is an attempt to express the inequality presented in the Lorenz curve as a single number. In essence, it is twice the area between the equality curve and the real Lorenz curve. Put simply:

\begin{aligned} G &= 2 \bigg( \int_{0}^{1} pdp - \int_{0}^{1} L(p)dp \bigg) \\ \therefore G &= 1 - 2 \int_{0}^{1} L(p)dp \end{aligned}

where $$G=0$$ in case of perfect equality and $$G = 1$$ in the case of perfect inequality.

The estimator proposed by Osier (2009)Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” Journal of the European Survey Research Association 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369. is defined as:

$\widehat{G} = \frac{ 2 \sum_{i \in S} w_i r_i y_i - \sum_{i \in S} w_i y_i }{ \hat{Y} }$

The linearized formula of $$\widehat{G}$$ is used to calculate the SE.

A replication example

The R vardpoor package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a gini coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

library(survey)

library(vardpoor)

library(laeken)

# load the synthetic EU statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )

# add a column with the row number
dati <- data.table::data.table(IDd = 1 : nrow(eusilc), eusilc)

# calculate the gini coefficient
# using the R vardpoor library
varpoord_gini_calculation <-
varpoord(

# analysis variable
Y = "eqincome",

# weights variable
w_final = "rb050",

# row number variable
ID_level1 = "IDd",

# row number variable
ID_level2 = "IDd",

# strata variable
H = "db040",

N_h = NULL ,

# clustering variable
PSU = "rb030",

# data.table
dataset = dati,

# gini coefficient function
type = "lingini",

# poverty threshold range
order_quant = 50L ,

# get linearized variable
outp_lin = TRUE

)

# construct a survey.design
# using our recommended setup
des_eusilc <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc
)

# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )

# coefficients do match
varpoord_gini_calculation$all_result$value
##  26.49652
coef( svygini( ~ eqincome , des_eusilc ) ) * 100
## eqincome
## 26.49652
# linearized variables do match
# varpoord
lin_gini_varpoord<- varpoord_gini_calculation$lin_out$lin_gini
# convey
lin_gini_convey <- attr(svygini( ~ eqincome , des_eusilc ),"lin")

# check equality
all.equal(lin_gini_varpoord,100*lin_gini_convey )
##  TRUE
# variances do not match exactly
attr( svygini( ~ eqincome , des_eusilc ) , 'var' ) * 10000
##            eqincome
## eqincome 0.03790739
varpoord_gini_calculation$all_result$var
##  0.03783931
# standard errors do not match exactly
varpoord_gini_calculation$all_result$se
##  0.1945233
SE( svygini( ~ eqincome , des_eusilc ) ) * 100
##           eqincome
## eqincome 0.1946982

The variance estimate is computed by using the approximation defined in (1.1), where the linearized variable $$z$$ is defined by (1.2). The functions convey::svygini and vardpoor::lingini produce the same linearized variable $$z$$.

However, the measures of uncertainty do not line up, because library(vardpoor) defaults to an ultimate cluster method that can be replicated with an alternative setup of the survey.design object.

# within each strata, sum up the weights
cluster_sums <- aggregate( eusilc$rb050 , list( eusilc$db040 ) , sum )

# name the within-strata sums of weights the cluster_sum
names( cluster_sums ) <- c( "db040" , "cluster_sum" )

# merge this column back onto the data.frame
eusilc <- merge( eusilc , cluster_sums )

# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc ,
fpc = ~ cluster_sum
)

# again, immediately run the convey_prep function on the survey.design
des_eusilc_ultimate_cluster <- convey_prep( des_eusilc_ultimate_cluster )

# matches
attr( svygini( ~ eqincome , des_eusilc_ultimate_cluster ) , 'var' ) * 10000
##            eqincome
## eqincome 0.03783931
varpoord_gini_calculation$all_result$var
##  0.03783931
# matches
varpoord_gini_calculation$all_result$se
##  0.1945233
SE( svygini( ~ eqincome , des_eusilc_ultimate_cluster ) ) * 100
##           eqincome
## eqincome 0.1945233