2.2 Replication-Based Variance Estimation

All major functions in the convey library have S3 methods for the classes: survey.design, svyrep.design and DBIdesign. When the argument design is a survey design object with replicate weights created by the survey library, convey uses the method svrepdesign.

Considering the remarks in Wolter (1985, 163Wolter, K. M. 1985. Introduction to Variance Estimation. New York: Springer-Verlag.), concerning the deficiency of the Jackknife method in estimating the variance of quantiles, we suggest using bootstrap methods. This is specified in the survey::svrepdesign or survey::as.svrepdesign functions, as we show in the next example. For comparison, the function bootVar from the laeken library (Alfons and Templ 2013Alfons, Andreas, and Matthias Templ. 2013. “Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken.” Journal of Statistical Software 54 (15): 1–25. http://www.jstatsoft.org/v54/i15/.), also uses the bootstrap method to estimate variances.

It is worth pointing out tha bootstrap methods rely on randomly resampling the dataset. Therefore, users are likely to have slightly different variance estimates by chance. For reproducibility, the user can set a random seed using set.seed() before running survey::as.svrepdesign.

2.2.1 Replication Design Example

# load libraries
library( survey )
library( convey )
library( laeken ) # for the dataset

# get laeken eusilc data
data( eusilc ) ; names( eusilc ) <- tolower( names( eusilc ) )

# survey design object for TSL/ifluence function variance estimation
des_eusilc <- svydesign( ids = ~rb030 , strata = ~db040 ,  weights = ~rb050 , data = eusilc )
des_eusilc <- convey_prep( des_eusilc )

# influence function SE estimate for the gini index
convey:::svygini.survey.design( ~eqincome , design = des_eusilc )
##             gini     SE
## eqincome 0.26497 0.0019
# create survey design object for replicate-based variance estimation
des_eusilc_rep <- as.svrepdesign( des_eusilc , type = "bootstrap" )
des_eusilc_rep <- convey_prep( des_eusilc_rep )

# replicate-based (bootstrao) SE estimate for the gini index
# with option to keep replicates
( gini.repstat <- convey:::svygini.svyrep.design( ~eqincome , design = des_eusilc_rep , return.replicates = TRUE ) )
##             gini     SE
## eqincome 0.26497 0.0022

To understand how that variance is estimated, we can look at the replicates:

# collect gini bootstrap replicates
( gini.reps <- gini.repstat$replicates )
##  [1] 0.2628816 0.2653672 0.2663951 0.2646311 0.2660021 0.2648396 0.2633737
##  [8] 0.2611656 0.2613233 0.2647174 0.2638256 0.2621616 0.2646830 0.2587950
## [15] 0.2642947 0.2651559 0.2663231 0.2673018 0.2687169 0.2671058 0.2654078
## [22] 0.2632859 0.2636763 0.2643964 0.2659183 0.2676759 0.2637824 0.2661465
## [29] 0.2652060 0.2635074 0.2678686 0.2655230 0.2614917 0.2636580 0.2670249
## [36] 0.2671674 0.2629817 0.2612323 0.2618645 0.2641601 0.2646633 0.2637810
## [43] 0.2631755 0.2639137 0.2618225 0.2598284 0.2670922 0.2672737 0.2625786
## [50] 0.2677954
## attr(,"scale")
## [1] 0.02042526
## attr(,"rscales")
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1
## attr(,"mse")
## [1] FALSE

These are resampling (bootstrap) replicates. With them, we can look at the variance of these replicates to get an estimate of the Gini index estimator’s variance:

# variance estimate
des.scale <- des_eusilc_rep$scale
meantheta <- mean( gini.reps )[[1]]
v <- sum( ( gini.reps - meantheta )^2 ) * des.scale

# SE estimate
( gini.se <- ( sqrt( v ) ) )
## [1] 0.002226009
# compare estimates
identical( gini.se , SE( gini.repstat )[[1]] )
## [1] TRUE