1.6 Influence Functions

Some measures of poverty and income concentration are defined by non-differentiable functions so that it is not possible to use Taylor linearization to estimate their variances. An alternative is to use Influence functions as described in (Deville 1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) and (Osier 2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” Journal of the European Survey Research Association 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.). The convey library implements this methodology to work with survey.design objects and also with svyrep.design objects.

Some examples of these measures are:

• At-risk-of-poverty threshold: $$arpt=.60q_{.50}$$ where $$q_{.50}$$ is the income median;

• At-risk-of-poverty rate $$arpr=\frac{\sum_U 1(y_i \leq arpt)}{N}.100$$

• Quintile share ratio

$$qsr=\frac{\sum_U 1(y_i>q_{.80})}{\sum_U 1(y_i\leq q_{.20})}$$

• Gini coefficient $$1+G=\frac{2\sum_U (r_i-1)y_i}{N\sum_Uy_i}$$ where $$r_i$$ is the rank of $$y_i$$.

Note that it is not possible to use Taylor linearization for these measures because they depend on quantiles and the Gini is defined as a function of ranks. This could be done using the approach proposed by Deville (1999) based upon influence functions.

Let $$U$$ be a population of size $$N$$ and $$M$$ be a measure that allocates mass one to the set composed by one unit, that is $$M(i)=M_i= 1$$ if $$i\in U$$ and $$M(i)=0$$ if $$i\notin U$$

Now, a population parameter $$\theta$$ can be expressed as a functional of $$M$$ $$\theta=T(M)$$

Examples of such parameters are:

• Total: $$Y=\sum_Uy_i=\sum_U y_iM_i=\int ydM=T(M)$$

• Ratio of two totals: $$R=\frac{Y}{X}=\frac{\int y dM}{\int x dM}=T(M)$$

• Cumulative distribution function: $$F(x)=\frac{\sum_U 1(y_i\leq x)}{N}=\frac{\int 1(y\leq x)dM}{\int{dM}}=T(M)$$

To estimate these parameters from the sample, we replace the measure $$M$$ by the estimated measure $$\hat{M}$$ defined by: $$\hat{M}(i)=\hat{M}_i= w_i$$ if $$i\in s$$ and $$\hat{M}(i)=0$$ if $$i\notin s$$.

The estimators of the population parameters can then be expressed as functional of the measure $$\hat{M}$$.

• Total: $$\hat{Y}=T(\hat{M})=\int yd\hat{M}=\sum_s w_iy_i$$

• Ratio of totals: $$\hat{R}=T(\hat{M})=\frac{\int y d\hat{M}}{\int x d\hat{M}}=\frac{\sum_s w_iy_i}{\sum_s w_ix_i}$$

• Cumulative distribution function: $$\hat{F}(x)=T(\hat{M})=\frac{\int 1(y\leq x)d\hat{M}}{\int{d\hat{M}}}=\frac{\sum_s w_i 1(y_i\leq x)}{\sum_s w_i}$$