Quantitative Analysis

Sahiba Chopra

Data Science Fellow 2024-2025
Haas School of Business

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

Why Data Disaggregation Matters: Exploring the Diversity of Asian American Economic Outcomes Using Public Use Microdata Sample (PUMS) Data

February 11, 2025
by Taesoo Song. Asian Americans are often overlooked in discussions of racial inequality due to their high average socioeconomic attainment. Many academic and policy researchers treat Asians as a single racial category in their analysis. However, this broad categorization can mask significant within-group disparities, leaving many disadvantaged individuals without access to vital resources and policy support. Song emphasizes the importance of data disaggregation in revealing Asian American inequalities, particularly in areas like income and homeownership, and demonstrates how breaking down these categories can lead to more targeted and effective policy solutions.

Field Experiments in Corporations

January 28, 2025
by Yue Lin. How do social science researchers conduct field experiments with private actors? Yue Lin provides a brief overview of the recent developments in political economy and management strategy, with a focus on filing field experiments within private corporations. Unlike conventional targets like individuals and government agencies, private companies are an emergent sweet spot for scholars to test for important theories, such as sustainability, censorship, and market behavior. After comparing the strengths and weaknesses of this powerful yet nascent method, Lin brainstorms some practical solutions to improve the success rate of field experimental studies. She aims to introduce a new methodological tool in a nascent research field and shed some light on improving experimental quality while adhering to ethical standards.

Which Coin Should I Flip? The Multi-Arm Bandit

February 4, 2025
by Bruno Smaniotto. Consider the following game: You are given the option to choose between two coins to flip. These coins are possibly biased, so the probability of getting Heads for each coin might differ from 50/50. Each time that you flip Heads, you win one dollar. There are a total of 10 rounds. Which coin should you flip at each round? In this blog post, we will analyze this problem through the lens of a famous decision-making algorithm called the Multi-Arm Bandit, exploring how to structure the problem mathematically and how it can be solved for particular examples.

Causal Effect Estimation in Observational Field Studies of Thermal Comfort

April 1, 2025
by Ruiji Sun. We introduce and apply regression discontinuity to thermal comfort field studies, which are typically observational. The method utilizes policy thresholds in China, where the winter district heating policy is based on cities' geographical locations relative to the Huai River. Using the regression discontinuity method, we quantify the causal effects of the experiment treatment (district heating) on the physical indoor environments and subjective responses of building occupants. In contrast, using conventional correlational analysis, we demonstrate that the correlation between indoor operative temperature and thermal sensation votes does not accurately reflect the causal relationship between the two. This highlights the importance of causal inference methods in thermal comfort field studies and other observational studies in building science where the regression discontinuity method might apply.

Measuring Vowels Without Relying on Sex-Based Assumptions

April 8, 2025
by Amber Galvano. This tutorial builds on my previous post on Python for acoustic analysis, this time focusing on measuring vocal tract resonances without relying on sex-based assumptions. I demonstrate how to process audio files and vowel annotations using an adaptive method that optimizes the acoustic analysis across a recording. Instead of fixing parameters based on generalized vocal tract length correlations, this approach varies them within a defined range for greater accuracy. This not only enhances measurement precision but also avoids requiring (or assuming) speakers’ sex in data collection. Finally, I show how to filter for outliers and create high-quality vowel space visualizations.

Finley Golightly

IT Support & Helpdesk Supervisor
Applied Mathematics

Finley joined D-Lab as full-time staff launching their career in Data Science after graduating with a Bachelor's degree in Applied Math from UC Berkeley.

They have been with D-Lab since Fall 2020, formerly as part of the UTech Management team before joining as full-time staff in Fall 2023. They love the learning environment of D-Lab and their favorite part of the job is their co-workers! In their free time, they enjoy reading, boxing, listening to music, and playing Dungeons & Dragons. Feel free to stop by the front desk to ask them any questions or...

Python Web Scraping

March 5, 2025, 10:00am
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Web APIs

March 3, 2025, 10:00am
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

R Machine Learning with tidymodels: Parts 1-2

February 24, 2025, 3:00pm
Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.