4.3 Quintile Share Ratio (svyqsr)

✔️ Easy to understand
✔️ Can be adapted into other more commonly used variants, like the Palma ratio
✔️ Can be interpreted using a Lorenz curve
❌ Fails the Pigou-Dalton Principle (depending on the donor and recipient position in the income distribution)
❌ Focuses on specific parts of the income  distribution, not the entire distribution

Unlike the previous measure, the Quintile Share Ratio (QSR) is an inequality measure in itself, depending only on the income distribution to evaluate the degree of inequality. By definition, it can be described as the ratio between the income share held by the richest 20% and the poorest 20% of the population. In plain terms, it expresses how many times the wealthier part of the population are richer than the poorest part. For instance, a \(QSR = 4\) implies that the upper class takes home 4 times as much of the total income as the poor.

The QSR can be modified to a more general function of percentile share ratios. For instance, Cobham, Schlogl, and Sumner (2015Cobham, Alex, Luke Schlogl, and Andy Sumner. 2015. Inequality and the Tails: The Palma Proposition and Ratio Revisited.” Working Papers 143. United Nations, Department of Economics; Social Affairs. http://www.un.org/esa/desa/papers/2015/wp143_2015.pdf.) argues for using the Palma index, defined as the ratio between the share of the 10% richest over the share held by the poorest 40%. There are actually two ways to compute the Palma ratio with the convey package. One is using convey::svyqsr and the other is using convey::svylorenz - not matching exactly, since they are based on different estimators dor the quantile share. Since the Palma ratio is the top 10% divided by the bottom 40%, in convey::svyqsr this could be achieved using alpha1 = .40 and alpha2 = .90. Note that the vardpoor::linqsr function only accepts a single alpha parameter (defaulting to 0.2 and 1 - 0.2), meaning the Palma index cannot presently be computed using that function.

The details of the linearization of the QSR are discussed by Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) and Osier (2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” Journal of the European Survey Research Association 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.).


4.3.1 Replication Example

The R vardpoor package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a QSR coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the vardpoor library
library(vardpoor)

# load the laeken library
library(laeken)

# load the synthetic EU statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names(eusilc) <- tolower(names(eusilc))

# add a column with the row number
dati <- data.table::data.table(IDd = 1:nrow(eusilc), eusilc)

# calculate the qsr coefficient
# using the R vardpoor library
varpoord_qsr_calculation <-
  varpoord(
    # analysis variable
    Y = "eqincome",
    
    # weights variable
    w_final = "rb050",
    
    # row number variable
    ID_level1 = "IDd",
    
    # row number variable
    ID_level2 = "IDd",
    
    # strata variable
    H = "db040",
    
    N_h = NULL ,
    
    # clustering variable
    PSU = "rb030",
    
    # data.table
    dataset = dati,
    
    # qsr coefficient function
    type = "linqsr",
    
    # poverty threshold range
    alpha = 20 ,
    
    # get linearized variable
    outp_lin = TRUE
    
  )



# construct a survey.design
# using our recommended setup
des_eusilc <-
  svydesign(
    ids = ~ rb030 ,
    strata = ~ db040 ,
    weights = ~ rb050 ,
    data = eusilc
  )

# immediately run the convey_prep function on it
des_eusilc <- convey_prep(des_eusilc)

# coefficients do match
varpoord_qsr_calculation$all_result$value
## [1] 3.970004
coef(svyqsr( ~ eqincome , des_eusilc))
## eqincome 
## 3.970004
# linearized variables do match
# vardpoor
lin_qsr_varpoord <- varpoord_qsr_calculation$lin_out$lin_qsr
# convey
lin_qsr_convey <-
  as.numeric(attr(svyqsr( ~ eqincome ,
                          des_eusilc ,
                          linearized = TRUE) ,
                  "linearized"))

# check equality
all.equal(lin_qsr_varpoord, lin_qsr_convey)
## [1] TRUE
# variances do not match exactly
attr(svyqsr( ~ eqincome , des_eusilc) , 'var')
##             eqincome
## eqincome 0.001810537
varpoord_qsr_calculation$all_result$var
## [1] 0.001807323
# standard errors do not match exactly
varpoord_qsr_calculation$all_result$se
## [1] 0.04251263
SE(svyqsr( ~ eqincome , des_eusilc))
##            eqincome
## eqincome 0.04255041

the variance estimator and the linearized variable \(z\) are both defined in Linearization-Based Variance Estimation. The functions convey::svyqsr and vardpoor::linqsr produce the same linearized variable \(z\).

However, the measures of uncertainty do not line up, because library(vardpoor) defaults to an ultimate cluster method that can be replicated with an alternative setup of the survey.design object.

# within each strata, sum up the weights
cluster_sums <-
  aggregate(eusilc$rb050 , list(eusilc$db040) , sum)

# name the within-strata sums of weights the `cluster_sum`
names(cluster_sums) <- c("db040" , "cluster_sum")

# merge this column back onto the data.frame
eusilc <- merge(eusilc , cluster_sums)

# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
  svydesign(
    ids = ~ rb030 ,
    strata = ~ db040 ,
    weights = ~ rb050 ,
    data = eusilc ,
    fpc = ~ cluster_sum
  )

# again, immediately run the convey_prep function on the `survey.design`
des_eusilc_ultimate_cluster <-
  convey_prep(des_eusilc_ultimate_cluster)

# matches
attr(svyqsr( ~ eqincome , des_eusilc_ultimate_cluster) , 'var')
##             eqincome
## eqincome 0.001807323
varpoord_qsr_calculation$all_result$var
## [1] 0.001807323
# matches
varpoord_qsr_calculation$all_result$se
## [1] 0.04251263
SE(svyqsr( ~ eqincome , des_eusilc_ultimate_cluster))
##            eqincome
## eqincome 0.04251263

For additional usage examples of svyqsr, type ?convey::svyqsr in the R console.

4.3.2 Real World Examples

This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svyqsr function.

To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.

4.3.2.1 CPS-ASEC Household Income

svyqsr(~ htotval , cps_household_design)
##            qsr     SE
## htotval 17.209 0.1977
svyby(~ htotval , ~ sex , cps_household_design , svyqsr)
##           sex  htotval se.htotval
## male     male 15.38253  0.2893779
## female female 18.62292  0.3249165

4.3.2.2 CPS-ASEC Family Income

svyqsr(~ ftotval , cps_family_design)
##           qsr     SE
## ftotval 13.38 0.1755
svyby(~ ftotval , ~ sex , cps_family_design , svyqsr)
##           sex  ftotval se.ftotval
## male     male 11.16757  0.3691317
## female female 15.33070  0.4216303

4.3.2.3 CPS-ASEC Worker Earnings

svyqsr(~ pearnval , cps_ftfy_worker_design)
##             qsr     SE
## pearnval 6.8394 0.1304
svyby(~ pearnval , ~ sex , cps_ftfy_worker_design , svyqsr)
##           sex pearnval se.pearnval
## male     male 7.965207   0.1673114
## female female 6.883781   0.4940515

4.3.2.4 PNAD Contínua Per Capita Income

svyqsr( ~ deflated_per_capita_income , pnadc_design , na.rm = TRUE)
##                               qsr     SE
## deflated_per_capita_income 16.543 0.2652
svyby(~ deflated_per_capita_income ,
      ~ sex ,
      pnadc_design ,
      svyqsr ,
      na.rm = TRUE)
##           sex deflated_per_capita_income se.deflated_per_capita_income
## male     male                   16.60765                     0.2855086
## female female                   16.46269                     0.2721942

4.3.2.5 PNAD Contínua Worker Earnings

svyqsr( ~ deflated_labor_income , pnadc_design , na.rm = TRUE)
##                          qsr     SE
## deflated_labor_income 11.635 0.2021
svyby( ~ deflated_labor_income , ~ sex , pnadc_design , svyqsr , na.rm = TRUE)
##           sex deflated_labor_income se.deflated_labor_income
## male     male              10.96647                0.2373027
## female female              12.03254                0.2482242

4.3.2.6 SCF Family Net Worth

scf_MIcombine(with(scf_design , svyqsr( ~ networth)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyqsr(~networth)))
##            results       se
## networth -379.1143 61.77597
scf_MIcombine(with(scf_design , svyby( ~ networth, ~ hhsex , svyqsr)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyby(~networth, ~hhsex, svyqsr)))
##            results           se
## male   -2904.36529 4.598640e+05
## female   -51.58258 8.249331e+00

Note the highly skewed nature of a variable like net worth, wealth, and assets (as opposed to income and earnings) can make the quintile share ratio result appear to be quite different compared to the other real world examples shown in this section.

First, this function generates a negative result because the sum of net worth of the bottom 20% of households is negative. We can use the Lorenz curve to get the cumulative shares of income held by the poorest \(p\)% of the population. In other words, the poorest 20% of households have more debt than assets in aggregate:

networth_shares <-
  scf_MIcombine(with(scf_design , svylorenz(
    ~ networth, plot = FALSE , quantiles = seq(0, 1, .2)
  )))
coef(networth_shares)[2] # share owned by the bottom 20%
##       L(0.2) 
## -0.002255374

Second, the heavy skew of the assets distribution (compared to income or earnings) in the country leads to a quintile share ratio value much higher than the same calculation on an income variable. Compared to the other svyqsr results in this section, the concentration of so much net worth among the richest 20% of the population produces the extreme svyqsr result:

# share owned by the richest 20%:
1 - coef(networth_shares)[5] # 1 minus the share owned by the bottom 80%
##    L(0.8) 
## 0.8535274
# approximate qsr
(1 - coef(networth_shares)[[5]]) / coef(networth_shares)[[2]]
## [1] -378.4417

4.3.2.7 SCF Family Income

scf_MIcombine(with(scf_design , svyqsr( ~ income)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyqsr(~income)))
##         results       se
## income 23.16144 1.375722
scf_MIcombine(with(scf_design , svyby( ~ income, ~ hhsex , svyqsr)))
## Multiple imputation results:
##       m <- length(results)
##       scf_MIcombine(with(scf_design, svyby(~income, ~hhsex, svyqsr)))
##         results        se
## male   21.27596 1.2528214
## female 10.51288 0.9816677