Data Science

Causal Effect Estimation in Observational Field Studies of Thermal Comfort

April 1, 2025
by Ruiji Sun. We introduce and apply regression discontinuity to thermal comfort field studies, which are typically observational. The method utilizes policy thresholds in China, where the winter district heating policy is based on cities' geographical locations relative to the Huai River. Using the regression discontinuity method, we quantify the causal effects of the experiment treatment (district heating) on the physical indoor environments and subjective responses of building occupants. In contrast, using conventional correlational analysis, we demonstrate that the correlation between indoor operative temperature and thermal sensation votes does not accurately reflect the causal relationship between the two. This highlights the importance of causal inference methods in thermal comfort field studies and other observational studies in building science where the regression discontinuity method might apply.

Info Session: D-Lab Data Science & AI Fellowship (2025-2026)

April 17, 2025, 3:00pm
The D-Lab is seeking applications for the 2025-2026 cohort of Data Science & AI Fellows. This info session will give you an in-depth look at the D-Lab DSAI Fellowship and an opportunity for you to ask questions about the program that may be helpful to your application process to become a Fellow!

R SQL Fundamentals

April 28, 2025, 3:00pm
In this workshop, we provide an introduction to using SQL to query and retrieve data from relational databases in R. First, we’ll cover what relational databases and SQL are. Then, we’ll use different packages in R to navigate relational databases using SQL.

Looking Ahead: How Adolescents’ Consideration of Future Consequences Shapes Their Developmental Outcomes

March 25, 2025
by Elaine Luo. Adolescents constantly balance immediate impulses with long-term goals. Our research explored how adolescents differ in their tendency to think about immediate versus future consequences, and how these differences relate to academic performance, stress, and perceived life chances. Using Latent Profile Analysis, we identified three distinct groups: Indifferent (low consideration overall), Future-Focused (prioritizing future outcomes), and Dual-Focused (high consideration of both immediate and future outcomes). Results indicated the Dual-Focused adolescents had higher academic achievement, whereas the Future-Focused group perceived the most positive life prospects. A discussion on practical implications and future research direction for supporting balanced decision-making among adolescents is also provided.

The Evolving Landscape of Web Scraping on Social Media Platforms

March 11, 2025
by Nanqin Ying. As social media platforms enforce stricter policies against unauthorized data collection, businesses and researchers must adapt to new API-based access models. This shift limits large-scale web scraping, impacting industries reliant on social media insights. The transition to paid API access and stringent compliance measures raises concerns about accessibility, cost, and ethical data collection. This article explores the evolving regulatory landscape, the enforcement of API restrictions, and how organizations can legally and ethically navigate data access in a world where scraping is becoming increasingly difficult. Understanding these changes is crucial for staying compliant while maintaining valuable insights from social media data.

Python GPT Fundamentals

March 4, 2025, 10:00am
This workshop offers a general introduction to the GPT (Generative Pretrained Transformers) model. No technical background is required. We will explore the transformer architecture upon which GPT models are built, how transformer models encode natural language into embeddings, and how GPT predicts text.

Teaching Truth, Resisting Erasure: Disability Politics in a Changing America

February 25, 2025
by Jane (Mango) Angar. Disability is a social construct shaped by systemic exclusion rather than an inherent impairment. Society predominantly views disability through medical and economic lenses, leading to discrimination and marginalization. Disability rights have been hard-won through activism, yet disabled individuals still face poverty, social isolation, and violence. Recent policy rollbacks threaten disability protections, requiring vigilance from educators and advocates. Historical patterns show that marginalized groups are often the first targets of oppressive regimes. Teaching history with truth and resilience is an act of resistance. Activism, awareness, and collective action remain crucial in defending disability rights and promoting social justice.

Qualtrics Fundamentals (90 minutes)

May 16, 2025, 10:00am
Qualtrics is a powerful online tool available to Berkeley community members that can be used for a range of data collection activities. Primarily, Qualtrics is designed to make web surveys easy to write, test, and implement, but the software can be used for data entry, training, quality control, evaluation, market research, pre/post-event feedback, and other uses with some creativity.

LLMs for Exploratory Research

March 20, 2025, 10:00am
In a fast evolving artificial intelligence landscape, LLMs such as GPT have become a common buzzword. In the research community, their advantages and pitfalls are hotly debated. In this workshop, we will explore different chatbots powered by LLMs, beyond just ChatGPT. Our main goal will be to understand how LLMs can be used by researchers to conduct early-stage (or exploratory) research. Throughout the workshop, we will discuss best practices for prompt engineering and heuristics to evaluate the suitability of an LLM's output for our research purposes. Though the workshop primarily focuses on early-stage research, we will briefly discuss the use cases of LLMs in later stages of research, such as data analysis and writing.

Python Fundamentals: Parts 1-4

May 5, 2025, 12:00pm
This four-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.