Data Science

Which Coin Should I Flip? The Multi-Arm Bandit

February 4, 2025
by Bruno Smaniotto. Consider the following game: You are given the option to choose between two coins to flip. These coins are possibly biased, so the probability of getting Heads for each coin might differ from 50/50. Each time that you flip Heads, you win one dollar. There are a total of 10 rounds. Which coin should you flip at each round? In this blog post, we will analyze this problem through the lens of a famous decision-making algorithm called the Multi-Arm Bandit, exploring how to structure the problem mathematically and how it can be solved for particular examples.

Field Experiments in Corporations

January 28, 2025
by Yue Lin. How do social science researchers conduct field experiments with private actors? Yue Lin provides a brief overview of the recent developments in political economy and management strategy, with a focus on filing field experiments within private corporations. Unlike conventional targets like individuals and government agencies, private companies are an emergent sweet spot for scholars to test for important theories, such as sustainability, censorship, and market behavior. After comparing the strengths and weaknesses of this powerful yet nascent method, Lin brainstorms some practical solutions to improve the success rate of field experimental studies. She aims to introduce a new methodological tool in a nascent research field and shed some light on improving experimental quality while adhering to ethical standards.

Teaching Data Science as a Tool for Empowerment

February 18, 2025
by Elijah Mercer. Data literacy is a powerful tool for empowerment, especially for historically marginalized communities. Through Data Cafecito at Roadmap to Peace and helping teach Data 4AC at UC Berkeley, Elijah Mercer helps bridge the gap between data, advocacy, and justice. Data Cafecito fosters culturally responsive data practices for Latinx-serving organizations, while Data 4AC challenges students to critically analyze data’s role in systemic inequities. Drawing from his experience in education, Mercer uses interactive teaching methods to make data accessible and meaningful. By centering storytelling and community-driven insights, he aims to equip individuals with the skills to use data for social change.

Looking Ahead: How Adolescents’ Consideration of Future Consequences Shapes Their Developmental Outcomes

March 25, 2025
by Elaine Luo. Adolescents constantly balance immediate impulses with long-term goals. Our research explored how adolescents differ in their tendency to think about immediate versus future consequences, and how these differences relate to academic performance, stress, and perceived life chances. Using Latent Profile Analysis, we identified three distinct groups: Indifferent (low consideration overall), Future-Focused (prioritizing future outcomes), and Dual-Focused (high consideration of both immediate and future outcomes). Results indicated the Dual-Focused adolescents had higher academic achievement, whereas the Future-Focused group perceived the most positive life prospects. A discussion on practical implications and future research direction for supporting balanced decision-making among adolescents is also provided.

Measuring Vowels Without Relying on Sex-Based Assumptions

April 8, 2025
by Amber Galvano. This tutorial builds on my previous post on Python for acoustic analysis, this time focusing on measuring vocal tract resonances without relying on sex-based assumptions. I demonstrate how to process audio files and vowel annotations using an adaptive method that optimizes the acoustic analysis across a recording. Instead of fixing parameters based on generalized vocal tract length correlations, this approach varies them within a defined range for greater accuracy. This not only enhances measurement precision but also avoids requiring (or assuming) speakers’ sex in data collection. Finally, I show how to filter for outliers and create high-quality vowel space visualizations.

Causal Effect Estimation in Observational Field Studies of Thermal Comfort

April 1, 2025
by Ruiji Sun. We introduce and apply regression discontinuity to thermal comfort field studies, which are typically observational. The method utilizes policy thresholds in China, where the winter district heating policy is based on cities' geographical locations relative to the Huai River. Using the regression discontinuity method, we quantify the causal effects of the experiment treatment (district heating) on the physical indoor environments and subjective responses of building occupants. In contrast, using conventional correlational analysis, we demonstrate that the correlation between indoor operative temperature and thermal sensation votes does not accurately reflect the causal relationship between the two. This highlights the importance of causal inference methods in thermal comfort field studies and other observational studies in building science where the regression discontinuity method might apply.

The Evolving Landscape of Web Scraping on Social Media Platforms

March 11, 2025
by Nanqin Ying. As social media platforms enforce stricter policies against unauthorized data collection, businesses and researchers must adapt to new API-based access models. This shift limits large-scale web scraping, impacting industries reliant on social media insights. The transition to paid API access and stringent compliance measures raises concerns about accessibility, cost, and ethical data collection. This article explores the evolving regulatory landscape, the enforcement of API restrictions, and how organizations can legally and ethically navigate data access in a world where scraping is becoming increasingly difficult. Understanding these changes is crucial for staying compliant while maintaining valuable insights from social media data.

Digitizing Inclusion: FinTech’s Promise and Pitfalls in the Global South

April 22, 2025
by Victoria Hollingshead. FinTech promises to revolutionize financial inclusion by harnessing data science to reach populations historically excluded from formal financial systems. By analyzing digital footprints, mobile payments, and behavioral data, startups and financial institutions have the potential to improve customer-lender interactions and revolutionize screening and monitoring techniques. While FinTech shows promise of enabling financial access, it also raises critical questions: how do we implement financial inclusion without reproducing the structures of the past? And more rhetorically, can financiers be the arbiters of financial inclusion, if their intrinsic role is to stratify and exclude?

Sharing Just Enough: The Magic Behind Gaining Privacy while Preserving Utility

April 15, 2025
by Sohail Khan. Netflix knows what you like, but does it need to know your politics too? We often face a frustrating choice: share our data and be tracked, or protect our privacy and lose personalization. But what if there was a third option? This article begins by introducing the concept of the privacy-utility trade-off, then explores the methods behind strategic data distortion, a technique that lets you subtly tweak your data to block sensitive inferences (like political views) while still maintaining useful recommendations. Finally, it looks ahead and advocates for a future where users, not platforms, shape the rules, reclaiming control of their own privacy.

Qualtrics Fundamentals: Parts 1-2

April 14, 2025, 1:00pm
In this two-part workshop, we provide an introduction to using Qualtrics. In the first part, we'll cover how to use the platform and its features to create, distribute, and analyze surveys. In the second part, we'll discuss best practices for survey design.