1.2 Complex surveys and statistical inference

In this book, we demonstrate how to estimate poverty and inequality measures in a population using microdata collected from a complex survey sample. Most surveys administered by government agencies or larger research organizations utilize a sampling design that violates the assumption of simple random sampling (SRS), including:

  1. Different units selection probabilities;
  2. Clustering of units;
  3. Stratification of clusters;
  4. Reweighting to compensate for missing values and other adjustments.

Therefore, basic unweighted R commands such as mean() or glm() will not properly account for the weighting nor the measures of uncertainty (such as sampling variance estimates and confidence intervals) present in the dataset. For some examples of publicly-available complex survey data sets, see http://asdfree.com.

Unlike other software, the R convey package does not require that the user specify these parameters throughout the analysis. So long as the svydesign object or svrepdesign object has been constructed properly at the outset of the analysis, the convey package will incorporate the survey design automatically and produce statistics and variances that take the complex sample into account.

Survey analysts familiar with the R dplyr syntax implemented by the survey library’s wrapper srvyr package might be interested in implementing specific convey functions by following the svygini() example published by srvyr author Greg Freedman Ellis. Note that the full design stored by convey_prep() may in some cases complicate this extension.