Data Science

Filtering, Visualizing, and Interpreting Spatial Time Series Data

December 17, 2025
by Maksymilian Jasiak. Spatial time series (consecutive measurements across space and time) are often difficult to interpret, especially when there are many overlapping signals. However, have no fear! Filtering and visualizing can help better interpret and understand the spatial time series data.

Seeing Behavior in Everyday Data

December 10, 2025
by Skyler Chen. This post discusses how my training in data science changed the way I think about behavioral research. I share how simply exploring everyday datasets and noticing small, unexpected patterns can spark new research questions, and how archival data and experiments each offer distinct yet complementary insights into how people make judgments and decisions. I also highlight the growing set of tools that help us understand behavior in richer ways.

Digitization of Historical Maps in the Age of AI

December 3, 2025
by Elena Stacy. Researchers today increasingly have access to a wealth of tools to streamline or automate labor-intensive data processing and generation tasks. When it comes to mapping, progress has been slower. This blog details the author's experience tackling the digitization of a historical map in the age of AI.

A Practical Guide to Shift-Share Instruments (and What I Learned Replicating the China Shock)

November 26, 2025
by Jiayu Lai. Shift-share instruments are among the most widely used tools in applied economics, appearing in labor, trade, immigration, and policy evaluation research. But despite their popularity, many researchers still use them as black boxes — and risk invalid instruments as a result. In this blog post, I unpack how shift-share IVs actually work, why their validity depends on both the “shifts” and the “shares,” and what practical steps researchers should take to check assumptions. I also walk through how I used the Borusyak–Hull–Jaravel (2022, 2025) framework to reproduce the seminal Autor, Dorn, and Hanson (2013) China shock analysis.

Sahiba Chopra

Data Science Fellow 2024-2025
Haas School of Business

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

A Participant-Centered, GIS-Based Approach to Improving Contextual Measurement

November 19, 2025
by Sarah Daniel. Researchers increasingly recognize that neighborhoods profoundly shape life outcomes, yet measuring them remains challenging. A common approach uses administrative boundaries, such as census tracts, as proxies for neighborhoods, but this method presents three key challenges. First, administrative boundaries may fail to capture residents’ lived experiences, a limitation that is particularly concerning in marginalized communities; second, they can misrepresent contextual effects; and third, they may produce inconsistent findings. To address these issues, I advocate for the use of self-defined neighborhood boundaries as an alternative measure. I compare GIS- and non-GIS-based methods and propose that GIS-based methods offer the strongest potential for more valid measurement.

Beyond the Hype: How We Built AI Tools That Actually Support Learning

November 12, 2025
by Weiying Li. What does genuine partnership look like when building AI for education? Working with middle school teachers and computer scientists, we co-designed AI dialogs where teachers are valuable contributors to refine what the AI understands as valuable thinking. Through iterative refinement, teachers identified precursor ideas and observations that predicted future learning, and refined guidance design in the dialog. Our AI dialog sees learning the way teachers do, built through genuine collaboration where both model development, learning sciences theories, and teachers' classroom expertise work together from the start, not just at the end.

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.

A brief primer on Hidden Markov Models

April 25, 2022
by Amy Van Scoyoc. For many data science problems, there is a need to estimate unknown information from a sequence of observed events. There are many ways to tackle these types of sequential input problems. In the data science world, there is a tendency to use machine learning approaches to search for relations in the dataset. But in many cases, we don’t have enough data or the sequences are too long to train RNNs effectively. In such cases, simpler is better. Enter the Hidden Markov Model.

How to Get Involved in Computing Research as a Undergrad at UC Berkeley

October 15, 2025
by Abby O'Neill. Are you an undergrad interested in getting involved in CS/DS research? This blog post gives some advice for navigating the Berkeley research landscape. It includes mentions of structured programs like DARE, URAP, and Data Science Discovery, as well as cold emailing strategies and using office hours effectively. The main takeaway: Know your why, don't filter yourself out, and focus on finding people and projects that align with your goals.