- 1 Introduction
- 2 Poverty Indices
- 2.1 At Risk of Poverty Threshold (svyarpt)
- 2.2 At Risk of Poverty Ratio (svyarpr)
- 2.3 Relative Median Income Ratio (svyrmir)
- 2.4 Relative Median Poverty Gap (svyrmpg)
- 2.5 Median Income Below the At Risk of Poverty Threshold (svypoormed)
- 2.6 Foster-Greer-Thorbecke class (svyfgt, svyfgtdec)
- 2.7 Watts poverty measure (svywatts, svywattsdec)
- 2.8 Clark-Hemming-Ulph class of poverty measures (svychu)

- 3 Inequality Measurement
- 3.1 The Gender Pay Gap (svygpg)
- 3.2 Quintile Share Ratio (svyqsr)
- 3.3 Lorenz Curve (svylorenz)
- 3.4 Gini index (svygini)
- 3.5 Amato index (svyamato)
- 3.6 Zenga Index and Curve (svyzenga, svyzengacurve)
- 3.7 Entropy-based Measures
- 3.8 Generalized Entropy and Decomposition (svygei, svygeidec)
- 3.9 Rényi Divergence (svyrenyi)
- 3.10 J-Divergence and Decomposition (svyjdiv, svyjdivdec)
- 3.11 Atkinson index (svyatk)
- 3.12 Which inequality measure should be used?

- 4 Multidimensional Indices

Although the \(GPG\) is not an inequality measure in the usual sense, it can still be an useful instrument to evaluate the discrimination among men and women. Put simply, it expresses the relative difference between the average hourly earnings of men and women, presenting it as a percentage of the average of hourly earnings of men.

In mathematical terms, this index can be described as,

\[ GPG = \frac{ \bar{y}_{male} - \bar{y}_{female} }{ \bar{y}_{male} } \],

which is precisely the estimator used in the package. As we can see from the formula, if there is no difference among classes, \(GPG = 0\). Else, if \(GPG > 0\), it means that the average hourly income received by women are \(GPG\) percent smaller than men’s. For negative \(GPG\), it means that women’s hourly earnings are \(GPG\) percent larger than men’s. In other words, the larger the \(GPG\), larger is the shortfall of women’s hourly earnings.

We can also develop a more straightforward idea: for every $1 raise in men’s hourly earnings, women’s hourly earnings are expected to increase $\((1-GPG)\). For instance, assuming \(GPG = 0.8\), for every $1.00 increase in men’s average hourly earnings, women’s hourly earnings would increase only $0.20.

The details of the linearization of the `GPG`

are discussed by Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” *Survey Methodology* 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) and Osier (2009Osier, Guillaume. 2009. “Variance Estimation for Complex Indicators of Poverty and Inequality.” *Journal of the European Survey Research Association* 3 (3): 167–95. http://ojs.ub.uni-konstanz.de/srm/article/view/369.).

**A replication example**

The R `vardpoor`

package (Breidaks, Liberts, and Ivanova 2016Breidaks, Juris, Martins Liberts, and Santa Ivanova. 2016. “Vardpoor: Estimation of Indicators on Social Exclusion and Poverty and Its Linearization, Variance Estimation.” Riga, Latvia: CSB.), created by researchers at the Central Statistical Bureau of Latvia, includes a gpg coefficient calculation using the ultimate cluster method. The example below reproduces those statistics.

Load and prepare the same data set:

```
# load the convey package
library(convey)
# load the survey library
library(survey)
# load the vardpoor library
library(vardpoor)
# load the synthetic european union statistics on income & living conditions
data(eusilc)
# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )
# coerce the gender variable to numeric 1 or 2
eusilc$one_two <- as.numeric( eusilc$rb090 == "female" ) + 1
# add a column with the row number
dati <- data.table(IDd = 1 : nrow(eusilc), eusilc)
# calculate the gpg coefficient
# using the R vardpoor library
varpoord_gpg_calculation <-
varpoord(
# analysis variable
Y = "eqincome",
# weights variable
w_final = "rb050",
# row number variable
ID_level1 = "IDd",
# row number variable
ID_level2 = "IDd",
# strata variable
H = "db040",
N_h = NULL ,
# clustering variable
PSU = "rb030",
# data.table
dataset = dati,
# gpg coefficient function
type = "lingpg" ,
# gender variable
gender = "one_two",
# poverty threshold range
order_quant = 50L ,
# get linearized variable
outp_lin = TRUE
)
# construct a survey.design
# using our recommended setup
des_eusilc <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc
)
# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )
# coefficients do match
varpoord_gpg_calculation$all_result$value
```

`## [1] 7.645389`

`coef( svygpg( ~ eqincome , des_eusilc , sex = ~ rb090 ) ) * 100`

```
## eqincome
## 7.645389
```

```
# linearized variables do match
# vardpoor
lin_gpg_varpoord<- varpoord_gpg_calculation$lin_out$lin_gpg
# convey
lin_gpg_convey <- attr(svygpg( ~ eqincome , des_eusilc, sex = ~ rb090 ),"lin")
# check equality
all.equal(lin_gpg_varpoord,100*lin_gpg_convey[,1] )
```

`## [1] TRUE`

```
# variances do not match exactly
attr( svygpg( ~ eqincome , des_eusilc , sex = ~ rb090 ) , 'var' ) * 10000
```

```
## eqincome
## eqincome 0.6493911
```

`varpoord_gpg_calculation$all_result$var`

`## [1] 0.6482346`

```
# standard errors do not match exactly
varpoord_gpg_calculation$all_result$se
```

`## [1] 0.8051301`

`SE( svygpg( ~ eqincome , des_eusilc , sex = ~ rb090 ) ) * 100`

```
## eqincome
## eqincome 0.8058481
```

The variance estimate is computed by using the approximation defined in (1.1), where the linearized variable \(z\) is defined by (1.2). The functions `convey::svygpg`

and `vardpoor::lingpg`

produce the same linearized variable \(z\).

However, the measures of uncertainty do not line up, because `library(vardpoor)`

defaults to an ultimate cluster method that can be replicated with an alternative setup of the `survey.design`

object.

```
# within each strata, sum up the weights
cluster_sums <- aggregate( eusilc$rb050 , list( eusilc$db040 ) , sum )
# name the within-strata sums of weights the `cluster_sum`
names( cluster_sums ) <- c( "db040" , "cluster_sum" )
# merge this column back onto the data.frame
eusilc <- merge( eusilc , cluster_sums )
# construct a survey.design
# with the fpc using the cluster sum
des_eusilc_ultimate_cluster <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc ,
fpc = ~ cluster_sum
)
# again, immediately run the convey_prep function on the `survey.design`
des_eusilc_ultimate_cluster <- convey_prep( des_eusilc_ultimate_cluster )
# matches
attr( svygpg( ~ eqincome , des_eusilc_ultimate_cluster , sex = ~ rb090 ) , 'var' ) * 10000
```

```
## eqincome
## eqincome 0.6482346
```

`varpoord_gpg_calculation$all_result$var`

`## [1] 0.6482346`

```
# matches
varpoord_gpg_calculation$all_result$se
```

`## [1] 0.8051301`

`SE( svygpg( ~ eqincome , des_eusilc_ultimate_cluster , sex = ~ rb090 ) ) * 100`

```
## eqincome
## eqincome 0.8051301
```

For additional usage examples of `svygpg`

, type `?convey::svygpg`

in the R console.