Data Science

Design Your Observational Study with the Joint Variable Importance Plot

March 12, 2024
by Lauren Liao. When evaluating causal inference in observational studies, there often is a natural imbalance in the data. Luckily, variables are often measured alongside that can be helpful for adjustment. However, deciding which variables should be prioritized for adjustment is not trivial – since not all variables are equally important to the intervention or the outcome. I recommend using the joint variable importance plot during the observational study design phase to visualize which variables should be prioritized. This post provides a gentle guide on how to do so and why it is important.

A Basic Introduction to Hierarchical Linear Modeling

March 4, 2024
by Mingfeng Xue. Hierarchical Linear Modeling (HLM) is an extension of linear models, which offers an approach to analyzing data structures with nested levels. This blog elucidates HLM's significance over traditional linear regression models, particularly in handling clustered data and multilevel predictors. Illustrated with an example from educational research, the blog demonstrates model implementation and interpretation steps. It showcases how HLM accommodates both independent variables from different levels and hierarchical structure data, providing insights into their impacts on the outcome variable. Recommended resources further aid readers in mastering HLM techniques.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

How can we use big data from iNaturalist to address important questions in Entomology?

February 26, 2024
by Leah Lee. Large-scale geographic data over time on insect diversity can be used to answer important questions in Entomology. Open-source, open-access citizen science platforms like iNaturalist generate huge amounts of data on species diversity and distribution at accelerating rates. However, unstructured citizen science data contain inherent biases and need to be used with care. One of the efforts to validate big data from iNaturalist is to cross-check with systematically collected data, such as museum specimens.

Creating the Ultimate Sweet

January 30, 2024
by Emma Turtelboom. What is the best Halloween candy? In this blog post, we will identify attributes of popular sweets and create a model to understand how these attributes influence the popularity of the sweet. We’ll discuss alternative model approaches and potential drawbacks, as well as caveats to interpreting the predictions of our model.

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.

Measuring Migration: Old Challenges, New Opportunities

January 16, 2024
by Suraj Nair. In the 21st century, patterns of human migration are being reshaped by various forces, including economic opportunity, conflict, and anthropogenic climate change. Understanding these (and other) drivers of migration is key to designing and implementing policies which better promote human development. In this blog post, I discuss some well-known challenges in measuring migration, following which I provide a brief overview of my ongoing research demonstrating the opportunities offered by non-traditional data sources to provide new insights on migration patterns across the world.

Tracking Urban Expansion Through Satellite Imagery

December 12, 2023
by Leïla Njee Bugha. Among its many uses, remote sensing can prove especially useful to document changes and trends from eras or settings, where traditional sources are either inexistent or infrequently collected. This is the case when one wants to study urban expansion in sub-Saharan countries over the past 20 years. To further remedy the lack of data on land cover uses from earlier time periods, classification methods can be used as well. Using easily accessible satellite imagery from Google Earth Engine, I provide here an example combining remote sensing with classification to detect changes in the land cover in Nigeria since 2000 due to urban expansion.

The More Things Change the More They Stay the Same?

December 18, 2023
By Tonya D. Lindsey, Ph.D. Think about how often you hear someone gripe about the deterioration of society and then blame the Internet or social media. This blog suggests that the things we are exposed to virtually are not new but instead present us with more and frequent opportunities to reflect on perennial social problems and find solutions even as we better understand ourselves as individuals in a global community.

Social Sciences D-Lab Celebrates a Decade of Innovation and Welcomes New Leadership

December 4, 2023
by Claudia von Vacano, Ph.D. On November 8, 2023, the Social Sciences D-Lab celebrated its 10th anniversary with the introduction of the new faculty director, Demography Professor Dennis Feehan. The event celebrated D-Lab’s accomplishment with the outgoing faculty director Sociology Professor Dave Harding. Social Sciences Dean, Raka Ray, offered warm congratulations to the leadership and staff, underscoring the importance of promoting equity in computational social sciences and the support of qualitative methods research.