Data Manipulation and Cleaning

Filtering, Visualizing, and Interpreting Spatial Time Series Data

December 17, 2025
by Maksymilian Jasiak. Spatial time series (consecutive measurements across space and time) are often difficult to interpret, especially when there are many overlapping signals. However, have no fear! Filtering and visualizing can help better interpret and understand the spatial time series data.

Digitization of Historical Maps in the Age of AI

December 3, 2025
by Elena Stacy. Researchers today increasingly have access to a wealth of tools to streamline or automate labor-intensive data processing and generation tasks. When it comes to mapping, progress has been slower. This blog details the author's experience tackling the digitization of a historical map in the age of AI.

Sahiba Chopra

Data Science Fellow 2024-2025
Haas School of Business

I'm a PhD student in the Management and Organizations (Macro) group at Berkeley Haas. I have a diverse professional background, primarily as a data scientist across numerous industries, including fintech, cleantech, and media. I hold a BA in Economics from the University of Maryland, an MS in Applied Economics from the University of San Francisco, and an MS in Business Administration from UC Berkeley.

My research focuses on the intersection of inequality, technology, and the labor market. I am particularly interested in understanding how to reduce inequality in...

In Silico Approach to Mining Viral Sequences from Bulk RNA-Seq Data

October 28, 2025
by Carly Karrick. Viruses play important roles in evolution and influence ecosystems and host health. However, isolating and studying them can be difficult. In lieu of using resource-intensive methods to concentrate viruses into a “virome,” bulk sequencing methods include data from all biological entities present in a sample. In this tutorial, we explore an approach to mine viral sequences from publicly available bulk RNA-Seq data. The output from this analysis paves the way for future statistical analyses comparing viral communities in different contexts. This approach can be applied to other datasets, including studies of human health.

Python Data Wrangling and Manipulation with Pandas

October 19, 2021, 10:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

R Advanced Data Wrangling: Parts 1-2

October 5, 2021, 2:00pm
Advanced Data Wrangling aims to help students to learn powerful data wrangling tools and techniques in R to wrangle data with less pain and more fun. This workshop will show how R can make your data wrangling process faster, more reliable, and interpretable.

Sohail Khan

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
School of Information

Hey everyone, I’m Sohail - a 1st years Master’s student studying Data Science at the I-School. I am interested in the intersection between Computer Science, Data Science, and Cognitive Psychology and using these tools to understand, discover, and drive the development of assistive technologies.

I have experience building with brain computer Interfaces, developing distributed data processing applications, and am currently working on a large scale archival project aimed at preserving the history and memory of resistance movements through an embedding based...

Taesoo Song

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
City and Regional Planning

Taesoo is a Ph.D. candidate in the City and Regional Planning department at the University of California, Berkeley. He studies the nexus of housing policy, neighborhood change, and residential outcomes for low-income and minority households.

His dissertation aims to reassess the prevailing narrative that Asian Americans face minimal barriers in the housing market using quantitative and qualitative methods. Taesoo has worked with the Terner Center for Housing Innovation and the Urban Displacement Project at UC Berkeley, as well as the Seoul Institute in South Korea.

Jane (Mango) Angar

Senior Data Science Fellow 2025-2026, Data Science Fellow 2024-2025
Political Science

Hi! I am a PhD candidate in the Political Science Department at UC Berkeley. My dissertation traces the emergence of disability rights groups in Africa, focusing on Zambia and Malawi, and examines factors influencing their effectiveness. I use mixed methods, including archival work, field interviews, participant observation, and surveys for data collection.

My data analysis techniques include text analysis, social network analysis, means tests, and regressions. In my free time, I enjoy moderately difficult hikes, walks along the beach with my dog, Princess, and...