# Chapter 6 Covariance Matrix

The current convey software does not account for the covariance across svyby groups. The example below describes this calculation and presents what is, in this case, a limited impact on the final error terms. We acknowledge that, under ideal circumstances, the covariance would be properly accounted for in the software. Modifying our software to account for the covariance remains a future goal.

Influence functions and “resampling” replicates can be used to improve the inference about differences and changes between estimates. Put simply, the variance of the difference between two estimates $$\widehat{T}_1$$ and $$\widehat{T}_2$$ is given by

$Var \big( \widehat{T}_1 - \widehat{T}_2 \big) = Var \big( \widehat{T}_1 \big) + Var \big( \widehat{T}_2 \big) - 2 Cov \big( \widehat{T}_1 , \widehat{T}_2 \big)$

where: $$Var \big( \widehat{T}_1 \big)$$ and $$Var \big( \widehat{T}_2 \big)$$ are the variances of the estimators $$\widehat{T}_1$$ and $$\widehat{T}_2$$; $$Cov \big( \widehat{T}_1 , \widehat{T}_2 \big)$$ is the covariance of these estimators. Usually, the covariance term is ignored. But, specially for changes, the covariance tends to be positive, which results in a conservative variance estimator for the difference.

Then, how can we estimate the covariance? The survey package already provides an approach for that using the svyby function. Based on the ?survey::svyby examples, we have:

# load the survey library
library(survey)

#  load data set
data( api )

# declare sampling design
dclus1 <- svydesign( id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc )

# estimate means
mns <-svyby(~api99, ~stype, dclus1, svymean,covmat=TRUE)

# collect variacnce-covariance matrix of estimates
( m <- vcov( mns ) )
##          E         H         M
## E 520.5973  573.0404  684.6562
## H 573.0404 1744.2317  747.1989
## M 684.6562  747.1989 1060.1954
# compute variance terms
var.naive <- sum( diag( m[c(1,3),c(1,3)] ) )
cov.term <- sum( diag( m[ c(1,3),c(3,1)] ) )

# "naive" SE of the difference
sqrt( var.naive )
## [1] 39.75918
# SE of the difference
sqrt( var.naive - cov.term )
## [1] 14.54236
#... or using svycontrast
svycontrast( mns , c(E = 1, M = -1) )
##          contrast     SE
## contrast -0.80833 14.542

Notice that, because the covariance terms are positive, the (actual) variance of the difference is smaller than the “naive” variance.

A similar idea can be implemented with other estimators, such as inequality and poverty measures. However, this has not yet been implemented for linearization/influence function methods. In the next section, we show an example with the Gini index.