1.2 Complex surveys and statistical inference

In this book, we demonstrate how to measure poverty and income concentration in a population based on microdata collected from a complex survey sample. Most surveys administered by government agencies or larger research organizations utilize a sampling design that violates the assumption of simple random sampling (SRS), including:

  1. Different units selection probabilities;
  2. Clustering of units;
  3. Stratification of clusters;
  4. Reweighting to compensate for missing values and other adjustments.

Therefore, basic unweighted R commands such as mean() or glm() will not properly account for the weighting nor the measures of uncertainty (such as the confidence intervals) present in the dataset. For some examples of publicly-available complex survey data sets, see http://asdfree.com.

Unlike other software, the R convey package does not require that the user specify these parameters throughout the analysis. So long as the svydesign object or svrepdesign object has been constructed properly at the outset of the analysis, the convey package will incorporate the survey design automatically and produce statistics and variances that take the complex sample into account.