Data Manipulation and Cleaning

Sahiba Chopra

Data Science Fellow 2024-2025
Haas School of Business

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.

Python Data Wrangling and Manipulation with Pandas

October 19, 2021, 10:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

R Advanced Data Wrangling: Parts 1-2

October 5, 2021, 2:00pm
Advanced Data Wrangling aims to help students to learn powerful data wrangling tools and techniques in R to wrangle data with less pain and more fun. This workshop will show how R can make your data wrangling process faster, more reliable, and interpretable.

Sohail Khan

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
School of Information

Hey everyone, I’m Sohail - a 1st years Master’s student studying Data Science at the I-School. I am interested in the intersection between Computer Science, Data Science, and Cognitive Psychology and using these tools to understand, discover, and drive the development of assistive technologies.

I have experience building with brain computer Interfaces, developing distributed data processing applications, and am currently working on a large scale archival project aimed at preserving the history and memory of resistance movements through an embedding based...

Jane (Mango) Angar

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
Political Science

Hi! I am a PhD candidate in the Political Science Department at UC Berkeley. My dissertation traces the emergence of disability rights groups in Africa, focusing on Zambia and Malawi, and examines factors influencing their effectiveness. I use mixed methods, including archival work, field interviews, participant observation, and surveys for data collection.

My data analysis techniques include text analysis, social network analysis, means tests, and regressions. In my free time, I enjoy moderately difficult hikes, walks along the beach with my dog, Princess, and...

Taesoo Song

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
City and Regional Planning

Taesoo is a Ph.D. candidate in the City and Regional Planning department at the University of California, Berkeley. He studies the nexus of housing policy, neighborhood change, and residential outcomes for low-income and minority households.

His dissertation aims to reassess the prevailing narrative that Asian Americans face minimal barriers in the housing market using quantitative and qualitative methods. Taesoo has worked with the Terner Center for Housing Innovation and the Urban Displacement Project at UC Berkeley, as well as the Seoul Institute in South Korea.

Amber Galvano

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
Linguistics

I am a fourth-year PhD student in Linguistics, with a focus in sociophonetics and phonology. In my research, I'm interested in how understudied speech communities (Andalusians, southern Spain; Lobi and Tonko Limba, West Africa) and often-relegated aspects of social identity (sexuality, gender normativity) can inform new approaches to theory and methodology and how we conceptualize the interfaces between linguistic subfields.

I'm also involved in language documentation/revitalization work for Lobi and the development of automated phonetic methods, particularly for...

Bruno Smaniotto

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
Economics

I'm originally from Brazil, but I have been living in Berkeley for the last 5 years working towards my PhD in Economics. My main areas of interest are Behavioral and Macroeconomics, mostly their intersection, but I'm excited about learning and working on empirical applications on different fields.