Data Science

Research Paper Management with Notion and Zotero

January 14, 2026
by Joyce Chen. Managing research across countless tabs, notes, and PDFs can quickly become overwhelming. By integrating Zotero and Notion, you can create a workflow where papers, ideas, and writing come together seamlessly — turning scattered research into a cohesive workspace (or as much as possible).

Finley Golightly

IT Support & Helpdesk Supervisor
Applied Mathematics

Finley has been with D-Lab since Fall 2020, formerly as part of the UTech Management team before joining as full-time staff in Fall 2023. They love the learning environment of D-Lab and their favorite part of the job is their co-workers! In their free time, they enjoy reading, boxing, listening to music, and playing Dungeons & Dragons. Feel free to stop by the front desk to ask them any questions or just to chat!

Filtering, Visualizing, and Interpreting Spatial Time Series Data

December 17, 2025
by Maksymilian Jasiak. Spatial time series (consecutive measurements across space and time) are often difficult to interpret, especially when there are many overlapping signals. However, have no fear! Filtering and visualizing can help better interpret and understand the spatial time series data.

Seeing Behavior in Everyday Data

December 10, 2025
by Skyler Chen. This post discusses how my training in data science changed the way I think about behavioral research. I share how simply exploring everyday datasets and noticing small, unexpected patterns can spark new research questions, and how archival data and experiments each offer distinct yet complementary insights into how people make judgments and decisions. I also highlight the growing set of tools that help us understand behavior in richer ways.

Digitization of Historical Maps in the Age of AI

December 3, 2025
by Elena Stacy. Researchers today increasingly have access to a wealth of tools to streamline or automate labor-intensive data processing and generation tasks. When it comes to mapping, progress has been slower. This blog details the author's experience tackling the digitization of a historical map in the age of AI.

A Practical Guide to Shift-Share Instruments (and What I Learned Replicating the China Shock)

November 26, 2025
by Jiayu Lai. Shift-share instruments are among the most widely used tools in applied economics, appearing in labor, trade, immigration, and policy evaluation research. But despite their popularity, many researchers still use them as black boxes — and risk invalid instruments as a result. In this blog post, I unpack how shift-share IVs actually work, why their validity depends on both the “shifts” and the “shares,” and what practical steps researchers should take to check assumptions. I also walk through how I used the Borusyak–Hull–Jaravel (2022, 2025) framework to reproduce the seminal Autor, Dorn, and Hanson (2013) China shock analysis.

Sahiba Chopra

Data Science Fellow 2024-2025
Haas School of Business

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

A Participant-Centered, GIS-Based Approach to Improving Contextual Measurement

November 19, 2025
by Sarah Daniel. Researchers increasingly recognize that neighborhoods profoundly shape life outcomes, yet measuring them remains challenging. A common approach uses administrative boundaries, such as census tracts, as proxies for neighborhoods, but this method presents three key challenges. First, administrative boundaries may fail to capture residents’ lived experiences, a limitation that is particularly concerning in marginalized communities; second, they can misrepresent contextual effects; and third, they may produce inconsistent findings. To address these issues, I advocate for the use of self-defined neighborhood boundaries as an alternative measure. I compare GIS- and non-GIS-based methods and propose that GIS-based methods offer the strongest potential for more valid measurement.

Beyond the Hype: How We Built AI Tools That Actually Support Learning

November 12, 2025
by Weiying Li. What does genuine partnership look like when building AI for education? Working with middle school teachers and computer scientists, we co-designed AI dialogs where teachers are valuable contributors to refine what the AI understands as valuable thinking. Through iterative refinement, teachers identified precursor ideas and observations that predicted future learning, and refined guidance design in the dialog. Our AI dialog sees learning the way teachers do, built through genuine collaboration where both model development, learning sciences theories, and teachers' classroom expertise work together from the start, not just at the end.

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.