Out of school children in Brazil: an exercise on data visualization
PNAD and the importance of household surveys
The most common source for Brazilian information about living standards is the PNAD1. Usually, when some institution says something about poverty, inequality, education, housing, etc. in Brazil, they recur to the amazing publications produced by the Brazilian Institute for Geography and Statistics (IBGE).
But that is how it’s usually done. IBGE also produces amazing microdata, available to everyone. Using it, researchers can focus on specific questions regarding any topic investigated in the survey. The R workflow for the analysis of those sets is presented here at length, even showing how to match official statistics. This post aims to show how to visualize some results produced using that framework.
Yet, it is extremely important to calculate the variance. This step is commonly overlooked by many practitioners and researchers. And, while some don’t even try, others try to calculate it, but fail to use the adequate sample design. In this post, our challenge is to present why it is important to calculate the variance and how it can be used in a practical approach.
The right to education: an evidence-based analysis
Let’s analyse a set of results produced in R using data from 2005 to 2015. The plot below shows some convergence across the regions. This is somewhat expected if you consider that increasing school access becomes harder the closer we get to 100% inclusion.
Well, that is a good start. A careless analyst might say:
In 2015, the school attendance rate in the South was higher than in the Center-west region.
Is it really?
should have learned in your Statistics lectures, this is not how you’re supposed to do it. since this information comes from a sample, you have to account for sampling error. In order to be safer around this remarks, you should calculate and plot the confidence interval along with the estimates. Let’s do this, then:
With this graph, we see that we can’t say that. Since the lower bound of South’s CI is 92% while the upper bound of Center-west’s is 92.9%, the two CIs overlap. Put simply, they aren’t significantly different.
Let’s try with another plot. The next graph shows the series of attendance rate by household income per capita, in classes of minimum wage.
Cool, right? This one shows you that the differences in access to education across different income levels are diminishing over time. Yet, considering the entire country, children living in richer households are still more likely to be in school than those living in more deprived households, as the difference is still significant.
A question remain: has educational inequality fallen? This is an important discussion, but it is a topic for another post…
The Pesquisa Nacional por Amostra de Domicílios (PNAD) is a survey of Brazilian households, conducted every year.↩