Data Manipulation and Cleaning

Python Web APIs

October 22, 2024, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

Leah Lee

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
Integrative Biology

I am a PhD candidate in the department of Integrative Biology. My research interest is at the intersection of biomechanics, entomology, and physiology. Currently I am studying how beetles use their shield-like forewings called elytra for flight, thermoregulation, and protection. Prior to UC Berkeley, I worked as a research assistant at Korea Institute of Ocean Science and Technology (KIOST), studying algae phylogenetics. I received my B.A. in Biology and Mathematics from Swarthmore College.

Alex Ramiller

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
City and Regional Planning

I am a PhD Candidate in City and Regional Planning. My research focuses on the use of large administrative datasets to study residential mobility, neighborhood change, and housing access. I received a Master in Geography from the University of Washington and a Bachelor's in Economics and Geography from Macalester College. I have also consulted on analytical projects for several organizations including the San Francisco Federal Reserve Bank, PolicyLink, and the City of Seattle.

Farnam Mohebi

Data Science Fellow 2023-2024, Data Science for Social Justice Senior Fellow 2024
Haas School of Business

I am a PhD student at the Haas School of Business, University of California, Berkeley, and a researcher in the Department of Radiation Oncology at the University of California, San Francisco, having previously earned my MD and MPH degrees. My research focuses on the intersection of professionals and emerging technologies, drawing from the fields of medical sociology, organizational theory, and science and technology studies. I am particularly fascinated by the evolving relationship between physicians and artificial intelligence, the phenomenon of physician influencers, and the social...

Valeria Ramírez Castañeda

Data Science for Social Justice Fellow (2024-2025)
Integrative Biology

Valeria Ramírez Castañeda is a Colombian biologist currently pursuing a PhD in the Department of Integrative Biology at the University of California, Berkeley. I completed my undergraduate degree in Biology at the National University of Colombia and earned a master's degree in Ecology and Evolution, as well as another in Science Communication. During her PhD, she is studying the interactions between snakes and frogs and how this influences the evolution of toxin resistance in snakes. She is also collaborating and leading projects regarding the consequences of English in science and the...

R Machine Learning with tidymodels: Parts 1-2

October 14, 2024, 1:00pm
Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.

Python Data Wrangling and Manipulation with Pandas

October 10, 2024, 2:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

Python Data Wrangling and Manipulation with Pandas

September 27, 2024, 9:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

R Data Wrangling and Manipulation: Parts 1-2

October 1, 2024, 1:00pm
It is said that 80% of data analysis is spent on the process of cleaning and preparing the data for exploration, visualization, and analysis. This R workshop will introduce the dplyr and tidyr packages to make data wrangling and manipulation easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful tasks.

Stephanie Andrews

Availability: By appointment only

Consulting Areas: Python, SQL, HTML / CSS, Javascript, APIs, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Digital Humanities, Machine Learning, Natural Language Processing, Software Tools, Text Analysis, Web Scraping, Bash or Command Line, Excel, Git or Github, Tableau