Data Science

Transparency in Experimental Political Science Research

April 9, 2024
by Kamya Yadav. With the increase in studies with experiments in political science research, there are concerns about research transparency, particularly around reporting results from studies that contradict or do not find evidence for proposed theories (commonly called “null results”). To encourage publication of results with null results, political scientists have turned to pre-registering their experiments, be it online survey experiments or large-scale experiments conducted in the field. What does pre-registration look like and how can it help during data analysis and publication?

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.

Computational Social Science in a Social World: Challenges and Opportunities

March 26, 2024
by José Aveldanes. The rise of AI, Machine Learning, and Data Science are harbingers of the need for a significant shift in social science research. Computational Social Science enables us to go beyond traditional methods such as Ordinary Least Squares, which face challenges in addressing complexities of social phenomena, particularly in modeling nonlinear relationships and managing high-dimensionality data. This paradigmatic shift requires that we embrace these new tools to understand social life and necessitates understanding methodological and ethical challenges, including bias and representation. The integration of these technologies into social science research calls for a collaborative approach among social scientists, technologists, and policymakers to navigate the associated risk and possibilities of these new tools.

Using Big Data for Development Economics

March 18, 2024
by Leïla Njee Bugha. The proliferation of new sources of data emerging from 20th and 21st century technologies such as social media, internet, and mobile phones offers new opportunities for development economics research. Where such research was limited or impeded by existing data gaps or limited statistical capacity, big data can be used as a stopgap and help accurately quantify economic activity and inform policymaking in many different fields of research. Reduced cost and improved reliability are some key benefits of using big data for development economics, but as with all research designs, it requires thoughtful consideration of potential risks and harms.

Design Your Observational Study with the Joint Variable Importance Plot

March 12, 2024
by Lauren Liao. When evaluating causal inference in observational studies, there often is a natural imbalance in the data. Luckily, variables are often measured alongside that can be helpful for adjustment. However, deciding which variables should be prioritized for adjustment is not trivial – since not all variables are equally important to the intervention or the outcome. I recommend using the joint variable importance plot during the observational study design phase to visualize which variables should be prioritized. This post provides a gentle guide on how to do so and why it is important.

A Basic Introduction to Hierarchical Linear Modeling

March 4, 2024
by Mingfeng Xue. Hierarchical Linear Modeling (HLM) is an extension of linear models, which offers an approach to analyzing data structures with nested levels. This blog elucidates HLM's significance over traditional linear regression models, particularly in handling clustered data and multilevel predictors. Illustrated with an example from educational research, the blog demonstrates model implementation and interpretation steps. It showcases how HLM accommodates both independent variables from different levels and hierarchical structure data, providing insights into their impacts on the outcome variable. Recommended resources further aid readers in mastering HLM techniques.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

How can we use big data from iNaturalist to address important questions in Entomology?

February 26, 2024
by Leah Lee. Large-scale geographic data over time on insect diversity can be used to answer important questions in Entomology. Open-source, open-access citizen science platforms like iNaturalist generate huge amounts of data on species diversity and distribution at accelerating rates. However, unstructured citizen science data contain inherent biases and need to be used with care. One of the efforts to validate big data from iNaturalist is to cross-check with systematically collected data, such as museum specimens.

Creating the Ultimate Sweet

January 30, 2024
by Emma Turtelboom. What is the best Halloween candy? In this blog post, we will identify attributes of popular sweets and create a model to understand how these attributes influence the popularity of the sweet. We’ll discuss alternative model approaches and potential drawbacks, as well as caveats to interpreting the predictions of our model.

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.