Programming Languages

Python Introduction to Machine Learning: Parts 1-2

September 27, 2021, 2:00pm
This workshop introduces students to scikit-learn, the popular machine learning library in Python, as well as the auto-ML library built on top of scikit-learn, TPOT. The focus will be on scikit-learn syntax and available tools to apply machine learning algorithms to datasets. No theory instruction will be provided.

Python Fundamentals: Parts 1-4

October 26, 2021, 2:30pm
This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

R Fundamentals: Parts 1-4

October 6, 2021, 11:00am
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Forecasting Social Outcomes with Deep Neural Networks

October 7, 2025
by Paige Park. Our capacity to accurately predict social outcomes is increasing. Deep neural networks and artificial intelligence are crucial technologies pushing this progress along. As these tools reshape how social prediction is done, social scientists should feel comfortable engaging with them and meaningfully contributing to the conversation. But many social scientists are still unfamiliar with and sometimes even skeptical of deep learning. This tutorial is designed to help close that knowledge gap. We’ll walk step-by-step through training a simple neural network for a social prediction task: forecasting population-level mortality rates.

Amber Galvano

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
Linguistics

I am a fourth-year PhD student in Linguistics, with a focus in sociophonetics and phonology. In my research, I'm interested in how understudied speech communities (Andalusians, southern Spain; Lobi and Tonko Limba, West Africa) and often-relegated aspects of social identity (sexuality, gender normativity) can inform new approaches to theory and methodology and how we conceptualize the interfaces between linguistic subfields.

I'm also involved in language documentation/revitalization work for Lobi and the development of automated phonetic methods, particularly for...

Python Fundamentals: Parts 1-4

May 5, 2025, 12:00pm
This four-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Nikita Samarin

Data Science Fellow 2021-2022
Electrical Engineering and Computer Science (EECS)

Nikita Samarin is a doctoral student in Computer Science in the Department of Electrical Engineering and Computer Sciences (EECS) at the University of California, Berkeley advised by Serge Egelman and David Wagner. His research focuses on computer security and privacy from an interdisciplinary perspective, combining approaches from human-computer interaction, behavioral sciences, and legal studies. Samarin is a member of the Berkeley Lab for Usable and Experimental Security (BLUES) and an affiliated graduate researcher at the Center for Long-Term Cybersecurity (CLTC) and the...

Monica Donegan

Data Science Fellow 2022-2023
Environmental Science, Policy, and Management

Monica is a third-year Ph.D. candidate in the Environmental Science, Policy, and Management program. She uses computational tools to study the evolution and ecology of agricultural plant pathogens. Previously, she worked on a data science team at a biotech company in Boston.

Ruiji Sun

Data Science Fellow 2024-2025
Center for the Built Environment

Ruiji Sun is currently a Ph.D. candidate in Building Science at UC Berkeley. He is also a GSR at the Center for the Built Environment (CBE). His dissertation focuses on causal inference in the built environment. Other areas of his research include indoor environmental quality, personalized environmental control systems, and building energy modeling.

He obtained his M.S. degree from Carnegie Mellon University and double-majored in Mechanical Engineering (HVAC) and Architecture at Xi’an University of Architecture and Technology, China. Ruiji also served as a board...

Measuring Vowels Without Relying on Sex-Based Assumptions

April 8, 2025
by Amber Galvano. This tutorial builds on my previous post on Python for acoustic analysis, this time focusing on measuring vocal tract resonances without relying on sex-based assumptions. I demonstrate how to process audio files and vowel annotations using an adaptive method that optimizes the acoustic analysis across a recording. Instead of fixing parameters based on generalized vocal tract length correlations, this approach varies them within a defined range for greater accuracy. This not only enhances measurement precision but also avoids requiring (or assuming) speakers’ sex in data collection. Finally, I show how to filter for outliers and create high-quality vowel space visualizations.