Data Manipulation and Cleaning

Sahiba Chopra

Data Science Fellow 2024-2025
Haas

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

John Salchak

Data Science Fellow 2024-2025
Political Science

I am a third year Ph.D. student in the Department of Political Science and I conduct empirical research on international and civil conflict. My work assesses the effects of U.S. security force assistance on partner state military design as well as the effects of foreign military interventions on subsequent dispute initiation between states.

I hold an MA from University of Chicago and a BA from George Washington University.

Amber Galvano

Data Science Fellow 2024-2025, Data Science for Social Justice Fellow 2024
Linguistics
Linguistics (sociophonetics, phonology)

I am a fourth-year PhD student in Linguistics, with a focus in sociophonetics and phonology. In my research, I'm interested in how understudied speech communities (Andalusians, southern Spain; Lobi and Tonko Limba, West Africa) and often-relegated aspects of social identity (sexuality, gender normativity) can inform new approaches to theory and methodology and how we conceptualize the interfaces between linguistic subfields.

I'm also involved in language documentation/revitalization work for Lobi and the development of automated phonetic methods, particularly for...

Bruno Smaniotto

Data Science Fellow 2024-2025
Economics

I'm originally from Brazil, but I have been living in Berkeley for the last 5 years working towards my PhD in Economics. My main areas of interest are Behavioral and Macroeconomics, mostly their intersection, but I'm excited about learning and working on empirical applications on different fields.

Mingyu Yuan

Data Science for Social Justice Senior Fellow 2024
Linguistics

I am a Ph.D. candidate in Linguistics, with a focus on phonetics and phonology, specifically speech production in neuro-atypical populations. I use methods from Natural Language Processing in my day-to-day research.

Stephanie Andrews

Data Science for Social Justice Senior Fellow 2024
Info & Data Science MIDS

Stephanie Andrews is currently studying data science in the MIDS program, having previously majored in Social Welfare as an undergraduate at Cal. After graduating, she worked as an advocate for survivors of gender-based violence, as a public policy analyst focusing on anti-trafficking initiatives, and as a software engineer for progressive and social impact organizations. She is now conducting research with the Human Rights Center's Investigations Lab, using OSINT and data science methods to investigate human rights violations.

Violet Davis

Data Science for Social Justice Senior Fellow 2024
MIDS

I am a Masters student studying Data Science with the School of Information. My research involves computational social science projects focused on social justice and equity.

R Data Wrangling and Manipulation

November 5, 2021, 1:00pm
It is said that 80% of data analysis is spent on the process of cleaning and preparing the data for exploration, visualization, and analysis. This R workshop will introduce the dplyr and tidyr packages to make data wrangling and manipulation easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful tasks.

Python Data Wrangling and Manipulation with Pandas

February 8, 2024, 2:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

R Data Wrangling and Manipulation: Parts 1-2

May 24, 2022, 1:00pm
It is said that 80% of data analysis is spent on the process of cleaning and preparing the data for exploration, visualization, and analysis. This R workshop will introduce the dplyr and tidyr packages to make data wrangling and manipulation easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful tasks.