## 2.1 At Risk of Poverty Threshold (svyarpt)

The at-risk-of-poverty threshold (ARPT) is a measure used to define the people whose incomes imply a low standard of living in comparison to the general living standards. I.e., even though some people are not below the effective poverty line, those below the ARPT can be considered “almost deprived.”

This measure is defined as $$0.6$$ times the median income for the entire population:

$arpt = 0.6 \times median(y),$ where, $$y$$ is the income variable and median is estimated for the whole population. The details of the linearization of the arpt are discussed by Deville (1999)Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf. and Deville (1999)Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf..

A replication example

The R vardpoor package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a arpt coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

# load the convey package
library(convey)

library(survey)

library(vardpoor)

library(laeken)

# load the synthetic EU statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )

# add a column with the row number
dati <- data.table::data.table(IDd = 1 : nrow(eusilc), eusilc)

# calculate the arpt coefficient
# using the R vardpoor library
varpoord_arpt_calculation <-
varpoord(

# analysis variable
Y = "eqincome",

# weights variable
w_final = "rb050",

# row number variable
ID_level1 = "IDd",

# row number variable
ID_level2 = "IDd",

# strata variable
H = "db040",

N_h = NULL ,

# clustering variable
PSU = "rb030",

# data.table
dataset = dati,

# arpt coefficient function
type = "linarpt",

# poverty threshold range
order_quant = 50L ,

# get linearized variable
outp_lin = TRUE
)

# construct a survey.design
# using our recommended setup
des_eusilc <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc
)

# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )

# coefficients do match
varpoord_arpt_calculation$all_result$value
##  10859.24
coef( svyarpt( ~ eqincome , des_eusilc ) )
## eqincome
## 10859.24
# linearized variables do match
# vardpoor
lin_arpt_varpoord<- varpoord_arpt_calculation$lin_out$lin_arpt
# convey
lin_arpt_convey <- attr(svyarpt( ~ eqincome , des_eusilc ),"lin")

# check equality
all.equal(lin_arpt_varpoord, lin_arpt_convey )
##  TRUE
# variances do not match exactly
attr( svyarpt( ~ eqincome , des_eusilc ) , 'var' )
##          eqincome
## eqincome 2564.027
varpoord_arpt_calculation$all_result$var
##  2559.442
# standard errors do not match exactly
varpoord_arpt_calculation$all_result$se
##  50.59093
SE( svyarpt( ~ eqincome , des_eusilc ) )
##          eqincome
## eqincome 50.63622

The variance estimate is computed by using the approximation defined in (1.1), where the linearized variable $$z$$ is defined by (1.2). The functions convey::svyarpt and vardpoor::linarpt produce the same linearized variable $$z$$.

However, the measures of uncertainty do not line up, because library(vardpoor) defaults to an ultimate cluster method that can be replicated with an alternative setup of the survey.design object.

# within each strata, sum up the weights
cluster_sums <- aggregate( eusilc$rb050 , list( eusilc$db040 ) , sum )

# name the within-strata sums of weights the cluster_sum
names( cluster_sums ) <- c( "db040" , "cluster_sum" )

# merge this column back onto the data.frame
eusilc <- merge( eusilc , cluster_sums )

# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc ,
fpc = ~ cluster_sum
)

# again, immediately run the convey_prep function on the survey.design
des_eusilc_ultimate_cluster <- convey_prep( des_eusilc_ultimate_cluster )

# matches
attr( svyarpt( ~ eqincome , des_eusilc_ultimate_cluster ) , 'var' )
##          eqincome
## eqincome 2559.442
varpoord_arpt_calculation$all_result$var
##  2559.442
# matches
varpoord_arpt_calculation$all_result$se
##  50.59093
SE( svyarpt( ~ eqincome , des_eusilc_ultimate_cluster ) )
##          eqincome
## eqincome 50.59093

For additional usage examples of svyarpt, type ?convey::svyarpt in the R console.