## 2.4 Relative Median Poverty Gap (svyrmpg)

The relative median poverty gap (rmpg) is the relative difference between the median income of people having income below the arpt and the arpt itself:

$rmpg = \frac{median\{y_i, y_i<arpt\}-arpt}{arpt}$ The details of the linearization of the rmpg are discussed by Deville (1999)Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf. and Deville (1999)Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf..

A replication example

The R vardpoor package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a rmpg coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

library(survey)

library(vardpoor)

library(laeken)

# load the synthetic EU statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )

# add a column with the row number
dati <- data.table::data.table(IDd = 1 : nrow(eusilc), eusilc)

# calculate the rmpg coefficient
# using the R vardpoor library
varpoord_rmpg_calculation <-
varpoord(

# analysis variable
Y = "eqincome",

# weights variable
w_final = "rb050",

# row number variable
ID_level1 = "IDd",

# row number variable
ID_level2 = "IDd",

# strata variable
H = "db040",

N_h = NULL ,

# clustering variable
PSU = "rb030",

# data.table
dataset = dati,

# rmpg coefficient function
type = "linrmpg",

# poverty threshold range
order_quant = 50L ,

# get linearized variable
outp_lin = TRUE

)

# construct a survey.design
# using our recommended setup
des_eusilc <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc
)

# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )

# coefficients do match
varpoord_rmpg_calculation$all_result$value
## [1] 18.9286
coef( svyrmpg( ~ eqincome , des_eusilc ) ) * 100
## eqincome
##  18.9286
# linearized variables do match
# vardpoor
lin_rmpg_varpoord<- varpoord_rmpg_calculation$lin_out$lin_rmpg
# convey
lin_rmpg_convey <- attr(svyrmpg( ~ eqincome , des_eusilc ),"lin")

# check equality
all.equal(lin_rmpg_varpoord, 100*lin_rmpg_convey[,1] )
## [1] TRUE
# variances do not match exactly
attr( svyrmpg( ~ eqincome , des_eusilc ) , 'var' ) * 10000
##          eqincome
## eqincome 0.332234
varpoord_rmpg_calculation$all_result$var
## [1] 0.3316454
# standard errors do not match exactly
varpoord_rmpg_calculation$all_result$se
## [1] 0.5758866
SE( svyrmpg( ~ eqincome , des_eusilc ) ) * 100
##           eqincome
## eqincome 0.5763974

The variance estimate is computed by using the approximation defined in (1.1), where the linearized variable $$z$$ is defined by (1.2). The functions convey::svyrmpg and vardpoor::linrmpg produce the same linearized variable $$z$$.

However, the measures of uncertainty do not line up, because library(vardpoor) defaults to an ultimate cluster method that can be replicated with an alternative setup of the survey.design object.

# within each strata, sum up the weights
cluster_sums <- aggregate( eusilc$rb050 , list( eusilc$db040 ) , sum )

# name the within-strata sums of weights the cluster_sum
names( cluster_sums ) <- c( "db040" , "cluster_sum" )

# merge this column back onto the data.frame
eusilc <- merge( eusilc , cluster_sums )

# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc ,
fpc = ~ cluster_sum
)

# again, immediately run the convey_prep function on the survey.design
des_eusilc_ultimate_cluster <- convey_prep( des_eusilc_ultimate_cluster )

# matches
attr( svyrmpg( ~ eqincome , des_eusilc_ultimate_cluster ) , 'var' ) * 10000
##           eqincome
## eqincome 0.3316454
varpoord_rmpg_calculation$all_result$var
## [1] 0.3316454
# matches
varpoord_rmpg_calculation$all_result$se
## [1] 0.5758866
SE( svyrmpg( ~ eqincome , des_eusilc_ultimate_cluster ) ) * 100
##           eqincome
## eqincome 0.5758866

For additional usage examples of svyrmpg, type ?convey::svyrmpg in the R console.