Using Big Data for Development Economics

March 18, 2024

Using Big Data for Development Economics

Development economics analyzes the economic aspects of the development process (‘how countries become richer’) in low- and middle-income countries. For many reasons (administrative capacity, resources, conflict, or priority ranking), these countries often suffer from large data gaps, where basic administrative data is not up to date, collected, or of poor quality. In such an environment, big data such as web, phone, or social media data offers alternative sources of data that do not depend on government priority and allocation of resources but still allows us to answer questions relevant to the development process of these countries.  

This blog post is heavily inspired by INFO 288 (Big Data and Development) taught this semester by Joshua Blumenstock, Emily Aiken, and Zoe Kahn.

Why the need for big data?

As new technologies have emerged and expanded including in low- and middle-income countries, so has our ability to use data generated from these technologies to do social science. Social media data such as Twitter metadata/tweets have been used to quantify economic uncertainty, predict unemployment, or even detect earthquakes or other natural disasters. Such approaches are creative in the context of high-income countries such as the United States where a trove of data exists and would allow us to quantify such phenomena even in the absence of social media.

This might prove more difficult, however, in low- and middle-income countries, where there may be long delays between rounds of administrative surveys or disputes around the reliability of such surveys. Countries such as Afghanistan or the Democratic Republic of Congo have not had population censuses since the 1980s, and in countries like Nigeria, the last census was done in 2006 but led to tensions and disputes around the credibility of the figures. Such clear data gaps call for new solutions to improve both the statistical capacity of the countries considered as well as stopgaps that would help us with measuring and quantifying key economic indicators that are needed for policy or for understanding the economic situation of many regions in the world.

What data? What questions?

The rapid rise in mobile phone, internet, and digital finance penetration over the last fifteen years has allowed for a proliferation of new data sources that can be used for economic analysis. As such, big data as varied as mobile phone metadata, social media data, web data, and financial transactions data have been used to answer “big” questions in development economics, palliating the lack of existing rich administrative data. Mobile phone metadata, along with machine learning algorithms, for example, have been used to predict migration and poverty. In recent years, in the aftermath of the COVID-19 epidemic, mobile phone metadata and machine learning have been used to help low-resourced governments identify the poorest among the population to allocate their social protection programs.

Remote sensing is an array of techniques or tools for satellite detection of objects, based on radiation emitted and reflected off the earth. It has many applications, much beyond its use in development economics. In the field of development economics, remote sensing has for example helped farmers and governments in low-income countries improve predictions of crop yield or crop types for their staple crops at a fraction of the cost of more classic agricultural censuses. Remote sensing has also been used in combination or in lieu of other survey data for impact evaluation of public policies for periods or regions that are not covered in this data, answering questions such as impacts of urban investments (informal settlement upgrading or public housing), cash transfers, agricultural extension programs, etc.

During health epidemics or emergencies such as Ebola or COVID-19, population movements and thus disease transmission were tracked using big data and might have allowed for better-targeted and more appropriate health policies and responses. It has also been used extensively in the context of humanitarian assistance to track population movements following conflicts or natural disasters.

Benefits

One key benefit of leveraging big data to support evidence-based policymaking is that it achieves great prediction accuracy at a fraction of the cost of collecting administrative or household surveys. Sources as varied as phone metadata, Facebook ads, and Google Street View images have been used along with fancy machine-learning techniques to predict poverty, wealth, and consumption with high accuracy. Machine learning predictions can also be extrapolated to people not in the original surveys/datasets used, achieving high-performance prediction at a finer temporal or spatial resolution than is usually available in traditional surveys. It should be noted that for ML predictions to work, traditional surveys are still required to serve as ground truth for the models.

In that way, big data can be used as a more reliable source of data in many places where reliable and exhaustive (full-coverage) statistics are hard to come by or where statistical capacity might be lacking. It allows the closing of some of the data gaps and can serve as a reliable measurement to inform policy-making in between rounds of “traditional” surveys. However, the uses of big data are not without their challenges and it is up to individual researchers to ensure that basic principles and values are respected in the process of collecting, generating, and processing this type of data.

Challenges

A lot of the concerns typically raised with big data research (privacy, transparency, consent, fairness, etc.) already show up in all aspects of social science research, no matter the country considered but as technologies emerge quickly and the amounts of data they allow to generate/compile grow exponentially, there is a need for careful consideration of the risks and potential harms caused by research using big data, especially in contexts with weak legal frameworks and protections or where knowledge and discussion around big data and uses of this data have not yet taken place.

For big data such as phone metadata, there are for example clear consent and privacy risks associated with their use: often, this type of data does not require informed consent from the mobile phone subscribers. Furthermore, in low- and middle-income countries, many countries do not have adequate regulatory or legal frameworks ensuring data privacy protection, leaving such concerns up to individual telecom companies. Similarly, concerns around re-identification or insufficient anonymization of subscribers could also emerge depending on the way the data is compiled and shared with researchers. This also raises the issue of potential surveillance, especially in countries with authoritarian or repressive governments, where mobility data from social media, phone data, or satellite could be used to track specific groups or individuals.

This type of research is often undertaken by researchers from Western countries applying a Western lens on norms and values around what constitutes sensitive information and what accepted uses would become of the data. They might also detain control over large troves of data from specific countries or groups without explicit consent or local understanding of the methods and the risks populations are exposed to. In contexts where phone metadata has been used for humanitarian aid delivery, there was a clear concern about the specific vulnerability of the populations that were tracked and ongoing discussions on how to trade off balancing privacy, dignity, and transparency for local populations against the supposed benefit of the research.

References

Papers using ML along with big data to predict poverty, migration, natural disasters, etc.:

  1. Jean, N. et al. (2016). Combining satellite imagery and machine learning to predict poverty. Science 353, 790–794.

  2. Maas, P., Almquist, Z., Giraudy, E., & Schneider, J. W. (2020). Using social media to measure demographic responses to natural disaster: Insights from a large-scale Facebook survey following the 2019 Australia Bushfires. arXiv preprint arXiv:2008.03665.

  3. Blumenstock, J. E. (2012). Inferring patterns of internal migration from mobile phone call records: evidence from Rwanda. Information Technology for Development, 18(2), 107-125.

  4. Baker, S. R., Bloom, N., Davis, S., & Renault, T. (2021). Twitter-derived measures of economic uncertainty.

  5. Proserpio, D., Counts, S., & Jain, A. (2016, May). The psychology of job loss: using social media data to characterize and predict unemployment. In Proceedings of the 8th ACM Conference on Web Science (pp. 223-232).

  6. Aiken, E., Bellue, S., Karlan, D., Udry, C., & Blumenstock, J. E. (2022). Machine learning and phone data can improve targeting of humanitarian aid. Nature, 603(7903), 864-870.

  7. Michaels, G., Nigmatulina, D., Rauch, F., Regan, T., Baruah, N., & Dahlstrand, A. (2021). Planning ahead for better neighborhoods: Long-run evidence from Tanzania. Journal of Political Economy, 129(7), 2112-2156.

Dangers and misuses of big data:

  1. Jerven, M. (2014). Poor numbers and what to do about them. The Lancet 388.
  2. Taylor, L., & Schroeder, R. (2015). Is bigger better? The emergence of big data as a tool for international development policy. GeoJournal, 80, 503-518.
  3. Sambasivan, N. and Holbrook, J. (2018). Toward responsible AI for the next billion users. ACM Interactions 26, 68–71.
  4. Maxmen, A. (2019). Can tracking people through phone-call data improve lives? Nature 569, 614.