- 1 Intro
- 2 Variance Estimation
- 3 Poverty Indices
- 3.1 At Risk of Poverty Threshold (svyarpt)
- 3.2 At Risk of Poverty Ratio (svyarpr)
- 3.3 Relative Median Income Ratio (svyrmir)
- 3.4 Relative Median Poverty Gap (svyrmpg)
- 3.5 Median Income Below the At Risk of Poverty Threshold (svypoormed)
- 3.6 Foster-Greer-Thorbecke class (svyfgt, svyfgtdec)
- 3.7 Watts poverty measure (svywatts, svywattsdec)

- 4 Inequality Measurement
- 4.1 Which inequality measure should be used?
- 4.2 The Gender Pay Gap (svygpg)
- 4.3 Quintile Share Ratio (svyqsr)
- 4.4 Lorenz Curve-based Measures
- 4.5 Lorenz Curve (svylorenz)
- 4.6 Gini index (svygini)
- 4.7 Zenga index (svyzenga)
- 4.8 Social Welfare Function-based Measures
- 4.9 Atkinson index (svyatk)
- 4.10 Entropy-based Measures
- 4.11 Generalized Entropy and Decomposition (svygei, svygeidec)
- 4.12 J-Divergence and Decomposition (svyjdiv, svyjdivdec)

- 5 Richness
- 6 Multidimensional Poverty
- 7 Covariance Matrix

This section explains the calculations used for survey designs created with the `svydesign()`

function within the `survey`

package.

In what follows, we often use the linearization method as a tool to produce an approximation for the variance of an estimator. From the linearized variable \(z\) of an estimator \(T\), we get from the expression below an estimate of the variance of \(T\).

If \(T\) can be expressed as a function of the population totals \(T = g(Y_1, Y_2, \ldots, Y_n)\), and if \(g\) is linear, the estimation of the variance of \(T = g(Y_1, Y_2, \ldots, Y_n)\) is straightforward. If \(g\) is not linear but is a ‘smooth’ function, then it is possible to approximate the variance of \(g(Y_1, Y_2, \ldots, Y_n)\) by the variance of its first order Taylor expansion. For example, we can use Taylor expansion to linearize the ratio of two totals. However, there are situations where Taylor linearization cannot be immediately computed, either because \(T\) cannot be expressed as functions of the population totals, or because \(g\) is not a ‘smooth’ function. A common example is the case where \(T\) is a quantile.

In these cases, an alternative form of linearization of \(T\) might suffice: the `Influence Function`

, proposed in Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” *Survey Methodology* 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.). Separately, replication methods such as `bootstrap`

and `jackknife`

also work.

In the `convey`

library, there are some basic functions that produce the linearized variables needed to measure income concentration and poverty. For example, looking at the income variable in some complex survey dataset, the `quantile`

of that income variable is linearized inside the function `convey::svyiqalpha`

and the sum total below any quantile of the variable is linearized inside the function `convey::svyisq`

.

From the linearized variables of these basic estimates, it is possible by using rules of composition, valid for influence functions, to derive the influence function of more complex estimates. By definition, the influence function is a Gateaux derivative and the rules of composition valid for Gateaux derivatives also hold for Influence Functions.

The following property of Gateaux derivatives is commonly used in the `convey`

library: Let \(g\) be a differentiable function of \(m\) variables. Suppose we want to compute the influence function of the estimator \(g(T_1, T_2,\ldots, T_m)\), knowing the influence function of the estimators \(T_i, i=1,\ldots, m\). Then the following holds:

\[ I(g(T_1, T_2,\ldots, T_m)) = \sum_{i=1}^m \frac{\partial g}{\partial T_i}I(T_i) \]

In the `convey`

library, this rule is implemented by the function `contrastinf`

, which uses the base R function `deriv`

to compute the formal partial derivatives \(\frac{\partial g}{\partial T_i}\).

For example, suppose we want to linearize the relative median poverty gap (RMPG), defined as the difference between the at-risk-of-poverty threshold (ARPT) and the median of incomes less than the ARPT, relative to the ARPT itself. Let’s say that this median income below the at-risk-of-poverty-threshold (POORMED) is the median of incomes less than ARPT:

\[ rmpg= \frac{(arpt-poormed)} {arpt} \]

If we know how to linearize ARPT and POORMED, then by applying the function `contrastinf`

with
\[
g(T_1,T_2)= \frac{(T_1 - T_2)}{T_1}
\]
we are also able to linearize the RMPG.

Using the notation in Osier (2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” *Journal of the European Survey Research Association* 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.), the variance of the estimator \(T(\hat{M})\) can approximated by:

\[\begin{equation} Var\left[T(\hat{M})\right]\cong var\left[\sum_s w_i z_i\right] \end{equation}\]

The `linearized`

variable \(z\) is given by the derivative of the functional:

\[\begin{equation} z_k=lim_{t\rightarrow0}\frac{T(M+t\delta_k)-T(M)}{t}=IT_k(M) \end{equation}\]

where, \(\delta_k\) is the Dirac measure in \(k\): \(\delta_k(i)=1\) if and only if \(i=k\).

This **derivative** is called the **Influence Function** and was introduced in the area of **Robust Statistics**.

Some measures of poverty and income concentration are defined by non-differentiable functions, so that it is not always possible to use Taylor Series Linearization (TSL) to estimate variances. An alternative is to use **influence functions** as described in Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” *Survey Methodology* 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) and Osier (2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” *Journal of the European Survey Research Association* 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.). The `convey`

library implements this methodology to work with `survey.design`

objects.2 Influence functions can also be estimated with `svyrep.design`

objects, but they are not used for variance estimation in these cases.

Some examples of these measures are:

At-risk-of-poverty threshold: \(arpt=.60q_{.50}\) where \(q_{.50}\) is the median income;

At-risk-of-poverty rate: \(arpr=\frac{\sum_U 1(y_i \leq arpt)}{N}.100\)

Quintile share ratio: \(qsr=\frac{\sum_U 1(y_i>q_{.80})}{\sum_U 1(y_i\leq q_{.20})}\)

Gini coefficient \(1+G=\frac{2\sum_U (r_i-1)y_i}{N\sum_Uy_i}\) where \(r_i\) is the rank of \(y_i\).

Note that it is not possible to use TSL for these measures because they rely on quantiles or, in the case of the Gini coefficient, a function of ranks. Therefore, we instead follow the approach proposed by Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” *Survey Methodology* 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) based upon influence functions.

Let \(U\) be a population of size \(N\) and \(M\) be a measure that allocates mass one to the set composed by one unit, that is \(M(i)=M_i= 1\) if \(i\in U\) and \(M(i)=0\) if \(i\notin U\).

Now, a population parameter \(\theta\) can be expressed as a functional of \(M\) \(\theta=T(M)\).

Examples of such parameters are:

Total: \(Y=\sum_Uy_i=\sum_U y_iM_i=\int ydM=T(M)\)

Ratio of two totals: \(R=\frac{Y}{X}=\frac{\int y dM}{\int x dM}=T(M)\)

Cumulative distribution function: \(F(x)=\frac{\sum_U 1(y_i\leq x)}{N}=\frac{\int 1(y\leq x)dM}{\int{dM}}=T(M)\)

To estimate these parameters from the sample, we replace the measure \(M\) by the estimated measure \(\hat{M}\) defined by: \(\hat{M}(i)=\hat{M}_i= w_i\) if \(i\in s\) and \(\hat{M}(i)=0\) if \(i\notin s\).

The estimators of the population parameters can then be expressed as functional of the measure \(\hat{M}\).

Total: \(\hat{Y}=T(\hat{M})=\int yd\hat{M}=\sum_s w_iy_i\)

Ratio of totals: \(\hat{R}=T(\hat{M})=\frac{\int y d\hat{M}}{\int x d\hat{M}}=\frac{\sum_s w_iy_i}{\sum_s w_ix_i}\)

Cumulative distribution function: \(\hat{F}(x)=T(\hat{M})=\frac{\int 1(y\leq x)d\hat{M}}{\int{d\hat{M}}}=\frac{\sum_s w_i 1(y_i\leq x)}{\sum_s w_i}\)

Total: \[ \begin{aligned} IT_k(M)&=lim_{t\rightarrow 0}\frac{T(M+t\delta_k)-T(M)}{t}\\ &=lim_{t\rightarrow 0}\frac{\int y.d(M+t\delta_k)-\int y.dM}{t}\\ &=lim_{t\rightarrow 0}\frac{\int yd(t\delta_k)}{t}=y_k \end{aligned} \]

Ratio of two totals: \[ \begin{aligned} IR_k(M)&=I\left(\frac{U}{V}\right)_k(M)=\frac{V(M)\times IU_k(M)-U(M)\times IV_k(M)}{V(M)^2}\\ &=\frac{X y_k-Y x_k}{X^2}=\frac{1}{X}(y_k-Rx_k) \end{aligned} \]

- At-risk-of-poverty threshold: \[ arpt = 0.6\times m \] where \(m\) is the median income.

\[ z_k= -\frac{0.6}{f(m)}\times\frac{1}{N}\times\left[I(y_k\leq m-0.5) \right] \]

- At-risk-of-poverty rate:

\[ arpr=\frac{\sum_U I(y_i \leq t)}{\sum_U w_i}.100 \] \[ z_k=\frac{1}{N}\left[I(y_k\leq t)-t\right]-\frac{0.6}{N}\times\frac{f(t)}{f(m)}\left[I(y_k\leq m)-0.5\right] \]

where:

\(N\) - population size;

\(t\) - at-risk-of-poverty threshold;

\(y_k\) - income of person \(k\);

\(m\) - median income;

\(f\) - income density function;