Regression Analysis

Andrea Lukas

UTech Manager
Computer Science
Data Science

Hi everyone! I'm Andrea Lukas, a 3rd-year student majoring in Computer Science and Data Science at UC Berkeley. I'm passionate about UI/UX design and AI-centered human-computer interaction, and I'm actively involved in Computational Cognition research using Large Language Models (LLMs). As the Manager at D-Lab, I'm excited to contribute to the team by optimizing operations and fostering collaboration.

Outside of my academic and professional work, I’m an active member of Berkeley's Dance Community, where I participate in various teams. I also enjoy discovering new matcha spots and...

Leveraging Large Language Models for Analyzing Judicial Disparities in China

October 8, 2024
by Nanqin Ying. This study analyzes over 50 million judicial decisions from China’s Supreme People’s Court to examine disparities in legal representation and their impact on sentencing across provinces. Focusing on 290 000 drug-related cases, it employs large language models to differentiate between private attorneys and public defenders and assess their sentencing outcomes. The methodology combines advanced text processing with statistical analysis, using clustering to categorize cases by province and representation, and regression models to isolate the effect of legal representation from factors like drug quantity and regional policies. Findings reveal significant regional disparities in legal access driven by economic conditions, highlighting the need for reforms in China’s legal aid system to ensure equitable representation for marginalized groups and promote transparent judicial data for systemic improvements.

Finley Golightly

IT Support & Helpdesk Supervisor
Applied Mathematics

Finley joined D-Lab as full-time staff launching their career in Data Science after graduating with a Bachelor's degree in Applied Math from UC Berkeley.

They have been with D-Lab since Fall 2020, formerly as part of the UTech Management team before joining as full-time staff in Fall 2023. They love the learning environment of D-Lab and their favorite part of the job is their co-workers! In their free time, they enjoy reading, boxing, listening to music, and playing Dungeons & Dragons. Feel free to stop by the front desk to ask them any questions or...

Anna Björklund

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
Linguistics

I am a fifth-year PhD student in the Department of Linguistics with an areal interest in the Wintuan languages, traditionally spoken in the northern Sacramento Valley and now undergoing revitalization. My primary research interests are in leveraging archival recordings for the phonetic analysis of these under-documented languages, as well as designing tools to assist in their revitalization. I have worked as a linguistic consultant for the Paskenta Band of Nomlaki Indians since 2020 and the Wintu Tribe of Northern California since 2022. I received my MA in linguistics from UC...

Leah Lee

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
Integrative Biology

I am a PhD candidate in the department of Integrative Biology. My research interest is at the intersection of biomechanics, entomology, and physiology. Currently I am studying how beetles use their shield-like forewings called elytra for flight, thermoregulation, and protection. Prior to UC Berkeley, I worked as a research assistant at Korea Institute of Ocean Science and Technology (KIOST), studying algae phylogenetics. I received my B.A. in Biology and Mathematics from Swarthmore College.

Alex Ramiller

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
City and Regional Planning

I am a PhD Candidate in City and Regional Planning. My research focuses on the use of large administrative datasets to study residential mobility, neighborhood change, and housing access. I received a Master in Geography from the University of Washington and a Bachelor's in Economics and Geography from Macalester College. I have also consulted on analytical projects for several organizations including the San Francisco Federal Reserve Bank, PolicyLink, and the City of Seattle.

Farnam Mohebi

Data Science Fellow 2023-2024, Data Science for Social Justice Senior Fellow 2024
Haas School of Business

I am a PhD student at the Haas School of Business, University of California, Berkeley, and a researcher in the Department of Radiation Oncology at the University of California, San Francisco, having previously earned my MD and MPH degrees. My research focuses on the intersection of professionals and emerging technologies, drawing from the fields of medical sociology, organizational theory, and science and technology studies. I am particularly fascinated by the evolving relationship between physicians and artificial intelligence, the phenomenon of physician influencers, and the social...

Valeria Ramírez Castañeda

Data Science for Social Justice Fellow (2024-2025)
Integrative Biology

Valeria Ramírez Castañeda is a Colombian biologist currently pursuing a PhD in the Department of Integrative Biology at the University of California, Berkeley. I completed my undergraduate degree in Biology at the National University of Colombia and earned a master's degree in Ecology and Evolution, as well as another in Science Communication. During her PhD, she is studying the interactions between snakes and frogs and how this influences the evolution of toxin resistance in snakes. She is also collaborating and leading projects regarding the consequences of English in science and the...

Causal Thinking in Thermal Comfort

September 17, 2024
by Ruiji Sun. We demonstrate the importance of causal thinking by comparing two linear regression approaches used in thermal comfort research: Approach (a), which regresses thermal sensation votes (y-axis) on indoor temperature (x-axis); Approach (b), which does the reverse, regressing indoor temperature (y-axis) on thermal sensation votes (x-axis). From a correlational perspective, they may appear interchangeable, but causal thinking reveals substantial and practical differences between them. Using the same data, we found Approach (b) leads to a 10 °C narrower than the conventionally derived comfort zone using Approach (a). This finding has important implications for occupant comfort and building energy efficiency. We highlight the importance of integrating causal thinking into correlation-based statistical methods, especially given the increasing volume of data in the built environment.

Theo Snow

Availability: By appointment only

Consulting Areas: Python, R, SQL, SAS, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Visualization, Geospatial Data, Maps & Spatial Analysis, Machine Learning, Mixed Methods, Qualitative methods, Surveys, Sampling & Interviews, Regression Analysis, Means Tests, Software Output Interpretation, Other, Excel, Git or Github, RStudio, RStudio Cloud, SAS, Tableau