1.4 Underlying Calculations

In what follows, we often use the linearization method as a tool to produce an approximation for the variance of an estimator. From the linearized variable \(z\) of an estimator \(T\), we get from the expression (1.1) an estimate of the variance of \(T\)

If \(T\) can be expressed as a function of the population totals \(T = g(Y_1, Y_2, \ldots, Y_n)\), and if \(g\) is linear, the estimation of the variance of \(T = g(Y_1, Y_2, \ldots, Y_n)\) is straightforward. If \(g\) is not linear but is a ‘smooth’ function, then it is possible to approximate the variance of \(g(Y_1, Y_2, \ldots, Y_n)\) by the variance of its first order Taylor expansion. For example, we can use Taylor expansion to linearize the ratio of two totals. However, there are situations where Taylor linearization cannot be immediately possible, either because \(T\) cannot be expressed as functions of the population totals, or because \(g\) is not a smooth function. An example is the case where \(T\) is a quantile.

In these cases, it might work an alternative form of linearization of \(T\), by Influence Function, as defined in (1.2), proposed in (Deville 1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.). Also, it coud be used replication methods such as bootstrap and jackknife.

In the convey library, there are some basic functions that produce the linearized variables needed to measure income concentration and poverty. For example, looking at the income variable in some complex survey dataset, the quantile of that income variable can be linearized by the function convey::svyiqalpha and the sum total below any quantile of the variable is linearized by the function convey::svyisq.

From the linearized variables of these basic estimates, it is possible by using rules of composition, valid for influence functions, to derive the influence function of more complex estimates. By definition the influence function is a Gateaux derivative and the rules rules of composition valid for Gateaux derivatives also hold for Influence Functions.

The following property of Gateaux derivatives was often used in the library convey. Let \(g\) be a differentiable function of \(m\) variables. Suppose we want to compute the influence function of the estimator \(g(T_1, T_2,\ldots, T_m)\), knowing the Influence function of the estimators \(T_i, i=1,\ldots, m\). Then the following holds:

\[ I(g(T_1, T_2,\ldots, T_m)) = \sum_{i=1}^m \frac{\partial g}{\partial T_i}I(T_i) \]

In the library convey this rule is implemented by the function contrastinf which uses the R function deriv to compute the formal partial derivatives \(\frac{\partial g}{\partial T_i}\).

For example, suppose we want to linearize the Relative median poverty gap(rmpg), defined as the difference between the at-risk-of-poverty threshold (arpt) and the median of incomes less than the arpt relative to the arprt:

\[ rmpg= \frac{arpt-medpoor} {arpt} \]

where medpoor is the median of incomes less than arpt.

Suppose we know how to linearize arpt and medpoor, then by applying the function contrastinf with \[ g(T_1,T_2)= \frac{(T_1 - T_2)}{T_1} \] we linearize the rmpg.