Blog post

Propensity Score Matching for Causal Inference: Creating Data Visualizations to Assess Covariate Balance in R

June 10, 2024
by Sharon Green. Although some people consider randomized experiments the gold standard, in many cases, it would be highly unethical to assign individuals to harmful exposures to measure their effects. Modern causal inference techniques help scientists to estimate treatment effects using observational data. In particular, propensity score matching helps scientists estimate causal effects using observational data by matching individuals so that the “treatment” and “control” groups are balanced on measured covariates. After implementing propensity score matching, data visualizations make it easier to assess the quality of the matches before estimating effects. This blog post is a tutorial for implementing propensity score matching and creating data visualizations to assess covariate balance–that is, visually assessing whether the matched individuals are balanced with respect to measured covariates.

Sand Mining - Plugging a Critical Data Gap

May 14, 2024
by Suraj Nair. Excessive sand mining is causing a global ecological crisis. In this blog post, I present why sand mining is one of the most pressing challenges facing the planet, and why persistent data gaps hinder accountability and monitoring. I also discuss an ongoing research project of mine where we combine freely available satellite imagery and machine learning models to build open-source sand mine detection tools that can plug some of these data gaps.

On the Transformative Power of Seeing Others

May 7, 2024
by Daniel Lobo. Daniel Lobo, a PhD Student in Sociology at UC Berkeley, discusses his journey from growing up in the urban working class to making it to Harvard College and UC Berkeley. He credits his mentors who were able to see him in a way that he could not see himself as the keys to his success. This gift, the power to see others for who they are and who they could be, animates his research and teaching, including on the NSF-IUSE project.

Enhancing Research Transparency Inspired by Grounded Theory

April 30, 2024
by Farnam Mohebi. Grounded theory, a powerful tool for qualitative analysis, can enhance data science research by improving transparency and impact. Researchers can create a vivid record of their process by meticulously documenting the entire research journey, including the decisions they make and the corresponding rationale behind them, from initial data exploration to developing and refining theories. Embracing grounded theory principles, such as iterative coding and constant comparison, can help data scientists build robust, data-driven theories while ensuring transparency throughout the research process. This approach makes research more replicable and understandable and invites others to engage with the work, fostering collaboration and constructive critique, ultimately elevating the value and reach of their findings.

Conceptual Mirrors: Reflecting on LLMs' Interpretations of Ideas

April 23, 2024
by María Martín López. As large language models begin to engrain themselves in our daily lives we must leverage cognitive psychology to explore the understanding that these algorithms have of our world and the people they interact with. LLMs give us new insights into how conceptual representations are formed given the limitations of data modalities they have access to. Is language enough for these models to conceptualize the world? If so, what conceptualizations do they have of us?

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Transparency in Experimental Political Science Research

April 9, 2024
by Kamya Yadav. With the increase in studies with experiments in political science research, there are concerns about research transparency, particularly around reporting results from studies that contradict or do not find evidence for proposed theories (commonly called “null results”). To encourage publication of results with null results, political scientists have turned to pre-registering their experiments, be it online survey experiments or large-scale experiments conducted in the field. What does pre-registration look like and how can it help during data analysis and publication?

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.

Computational Social Science in a Social World: Challenges and Opportunities

March 26, 2024
by José Aveldanes. The rise of AI, Machine Learning, and Data Science are harbingers of the need for a significant shift in social science research. Computational Social Science enables us to go beyond traditional methods such as Ordinary Least Squares, which face challenges in addressing complexities of social phenomena, particularly in modeling nonlinear relationships and managing high-dimensionality data. This paradigmatic shift requires that we embrace these new tools to understand social life and necessitates understanding methodological and ethical challenges, including bias and representation. The integration of these technologies into social science research calls for a collaborative approach among social scientists, technologists, and policymakers to navigate the associated risk and possibilities of these new tools.

Using Big Data for Development Economics

March 18, 2024
by Leïla Njee Bugha. The proliferation of new sources of data emerging from 20th and 21st century technologies such as social media, internet, and mobile phones offers new opportunities for development economics research. Where such research was limited or impeded by existing data gaps or limited statistical capacity, big data can be used as a stopgap and help accurately quantify economic activity and inform policymaking in many different fields of research. Reduced cost and improved reliability are some key benefits of using big data for development economics, but as with all research designs, it requires thoughtful consideration of potential risks and harms.