Data Visualization

Sahiba Chopra

Data Science Fellow 2024-2025
Haas

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

Mingyu Yuan

Data Science for Social Justice Senior Fellow 2024
Linguistics

I am a Ph.D. candidate in Linguistics, with a focus on phonetics and phonology, specifically speech production in neuro-atypical populations. I use methods from Natural Language Processing in my day-to-day research.

Violet Davis

Data Science for Social Justice Senior Fellow 2024
MIDS

I am a Masters student studying Data Science with the School of Information. My research involves computational social science projects focused on social justice and equity.

Skyler Yumeng Chen

Data Science for Social Justice Fellow 2024
Haas School of Business

Skyler is a Ph.D. student in Behavioral Marketing at the Haas School of Business. Her research centers on consumer behavior and judgment and decision-making, with a keen interest in both experimental methods and data science techniques. She holds a B.A. in Economics and a B.S. in Data Science from New York University Shanghai.

Grace Hu

Data Science for Social Justice Fellow 2024
Bioengineering

Grace is a 3rd year Bioengineering PhD candidate in the joint UC Berkeley-UCSF Graduate Program. Her research lies at the nexus of computational design and 3D-bioprinting to advance tissue engineering for regenerative medicine. She previously studied Materials Science and Engineering (B.S.) and Computer Science (M.S.) at Stanford University, where she investigated printable batteries to power an ultra-affordable scanning electron microscope and explored computer science education research by developing AI models to augment teaching ability.

In her free time she...

Propensity Score Matching for Causal Inference: Creating Data Visualizations to Assess Covariate Balance in R

June 10, 2024
by Sharon Green. Although some people consider randomized experiments the gold standard, in many cases, it would be highly unethical to assign individuals to harmful exposures to measure their effects. Modern causal inference techniques help scientists to estimate treatment effects using observational data. In particular, propensity score matching helps scientists estimate causal effects using observational data by matching individuals so that the “treatment” and “control” groups are balanced on measured covariates. After implementing propensity score matching, data visualizations make it easier to assess the quality of the matches before estimating effects. This blog post is a tutorial for implementing propensity score matching and creating data visualizations to assess covariate balance–that is, visually assessing whether the matched individuals are balanced with respect to measured covariates.

Transparency in Experimental Political Science Research

April 9, 2024
by Kamya Yadav. With the increase in studies with experiments in political science research, there are concerns about research transparency, particularly around reporting results from studies that contradict or do not find evidence for proposed theories (commonly called “null results”). To encourage publication of results with null results, political scientists have turned to pre-registering their experiments, be it online survey experiments or large-scale experiments conducted in the field. What does pre-registration look like and how can it help during data analysis and publication?

Design Your Observational Study with the Joint Variable Importance Plot

March 12, 2024
by Lauren Liao. When evaluating causal inference in observational studies, there often is a natural imbalance in the data. Luckily, variables are often measured alongside that can be helpful for adjustment. However, deciding which variables should be prioritized for adjustment is not trivial – since not all variables are equally important to the intervention or the outcome. I recommend using the joint variable importance plot during the observational study design phase to visualize which variables should be prioritized. This post provides a gentle guide on how to do so and why it is important.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

Creating the Ultimate Sweet

January 30, 2024
by Emma Turtelboom. What is the best Halloween candy? In this blog post, we will identify attributes of popular sweets and create a model to understand how these attributes influence the popularity of the sweet. We’ll discuss alternative model approaches and potential drawbacks, as well as caveats to interpreting the predictions of our model.