Statistics

A Practical Guide to Shift-Share Instruments (and What I Learned Replicating the China Shock)

November 26, 2025
by Jiayu Lai. Shift-share instruments are among the most widely used tools in applied economics, appearing in labor, trade, immigration, and policy evaluation research. But despite their popularity, many researchers still use them as black boxes — and risk invalid instruments as a result. In this blog post, I unpack how shift-share IVs actually work, why their validity depends on both the “shifts” and the “shares,” and what practical steps researchers should take to check assumptions. I also walk through how I used the Borusyak–Hull–Jaravel (2022, 2025) framework to reproduce the seminal Autor, Dorn, and Hanson (2013) China shock analysis.

John Louis-Strakes Lopez

Postdoctoral Scholar
Berkeley School of Education

John Louis-Strakes Lopez is a Data Science Education postdoctoral scholar. He recently received his PhD in Education from University of Caifornia, Irvine. John’s work looks at student epistemological development within data science contexts. He is also interested in designing -and studying artificial intelligence and playful learning technologies for learning. John serves as a co-chair for the International Learning Sciences Student Association.

Beyond work, you will find John reading at a local coffee shop or eating a warm bowl of Pho.

Jonathan Pedroza (JP)

Postdoctoral Scholar
Berkeley School of Education

JP is a postdoctoral scholar in Data Science Education. He received his PhD in Prevention Science from the University of Oregon. His research interests include: examining risk and protective factors of health disparities in Latina/o/x/e populations and investigating educational outcomes in underrepresented student populations. JP uses a social-ecological framework to address his research interests.

Previously, he has served as an adjunct lecturer at Cal Poly Pomona teaching research methods and statistics, as well as a data scientist at the University of Kansas' Accessible Teaching...

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.

A brief primer on Hidden Markov Models

April 25, 2022
by Amy Van Scoyoc. For many data science problems, there is a need to estimate unknown information from a sequence of observed events. There are many ways to tackle these types of sequential input problems. In the data science world, there is a tendency to use machine learning approaches to search for relations in the dataset. But in many cases, we don’t have enough data or the sequences are too long to train RNNs effectively. In such cases, simpler is better. Enter the Hidden Markov Model.

Forecasting Social Outcomes with Deep Neural Networks

October 7, 2025
by Paige Park. Our capacity to accurately predict social outcomes is increasing. Deep neural networks and artificial intelligence are crucial technologies pushing this progress along. As these tools reshape how social prediction is done, social scientists should feel comfortable engaging with them and meaningfully contributing to the conversation. But many social scientists are still unfamiliar with and sometimes even skeptical of deep learning. This tutorial is designed to help close that knowledge gap. We’ll walk step-by-step through training a simple neural network for a social prediction task: forecasting population-level mortality rates.

Maksymilian Jasiak

Data Science & AI Fellow 2025-2026
Civil and Environmental Engineering

Maksymilian Jasiak is a PhD Student in GeoSystems Engineering at the University of California, Berkeley. His research focuses on Distributed Fiber Optic Sensing (DFOS) for lifeline infrastructure monitoring. His work aims to advance critical infrastructure security and resilience. He holds a MS in GeoSystems Engineering from the University of California, Berkeley and a BS in Civil Engineering from the University of Illinois Urbana-Champaign.

Scarlet Sands-Bliss

Data Science & AI Fellow 2025-2026, Domain Consultant, Research IT
School of Public Health

Scarlet Bliss is an MS/PhD student in Epidemiology in the School of Public Health. Her work focuses on mixed methods approaches to characterizing and preventing spread of antimicrobial resistance and other enteric pathogens via the environment. She has experience in statistical analysis and public health bioinformatics. She is interested in ethical use of big data as it relates to epidemiologic research.

Sarah Daniel

Data Science & AI Fellow 2025-2026
Political Science

Sarah Daniel is a PhD candidate in Political Science, specializing in urban politics in Sub-Saharan Africa, with a particular focus on East Africa. Her research examines how neighborhood communities organize for collective action to improve service delivery, reduce inequality, and enhance political representation.

Joyce Chen

Data Science & AI Fellow 2025-2026
College of Engineering

Joyce is a PhD candidate in Transportation Engineering. Her research focuses on assessing safety and network impacts of autonomous vehicles. She has teaching experiences in statistics and programming. Prior to Berkeley, Joyce obtained her Bachelor of Science in Computer Science from the University of Michigan, and had worked as a software engineer at various companies.