Data Manipulation and Cleaning

Finley Golightly

IT Support & Helpdesk Supervisor
Applied Mathematics

Finley joined D-Lab as full-time staff launching their career in Data Science after graduating with a Bachelor's degree in Applied Math from UC Berkeley.

They have been with D-Lab since Fall 2020, formerly as part of the UTech Management team before joining as full-time staff in Fall 2023. They love the learning environment of D-Lab and their favorite part of the job is their co-workers! In their free time, they enjoy reading, boxing, listening to music, and playing Dungeons & Dragons. Feel free to stop by the front desk to ask them any questions or...

Leah Lee

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
Integrative Biology

I am a PhD candidate in the department of Integrative Biology. My research interest is at the intersection of biomechanics, entomology, and physiology. Currently I am studying how beetles use their shield-like forewings called elytra for flight, thermoregulation, and protection. Prior to UC Berkeley, I worked as a research assistant at Korea Institute of Ocean Science and Technology (KIOST), studying algae phylogenetics. I received my B.A. in Biology and Mathematics from Swarthmore College.

Alex Ramiller

Senior Data Science Fellow 2024-2025, Data Science Fellow 2023-2024
City and Regional Planning

I am a PhD Candidate in City and Regional Planning. My research focuses on the use of large administrative datasets to study residential mobility, neighborhood change, and housing access. I received a Master in Geography from the University of Washington and a Bachelor's in Economics and Geography from Macalester College. I have also consulted on analytical projects for several organizations including the San Francisco Federal Reserve Bank, PolicyLink, and the City of Seattle.

Farnam Mohebi

Data Science Fellow 2023-2024, Data Science for Social Justice Senior Fellow 2024
Haas School of Business

I am a PhD student at the Haas School of Business, University of California, Berkeley, and a researcher in the Department of Radiation Oncology at the University of California, San Francisco, having previously earned my MD and MPH degrees. My research focuses on the intersection of professionals and emerging technologies, drawing from the fields of medical sociology, organizational theory, and science and technology studies. I am particularly fascinated by the evolving relationship between physicians and artificial intelligence, the phenomenon of physician influencers, and the social...

Valeria Ramírez Castañeda

Data Science for Social Justice Fellow (2024-2025)
Integrative Biology

Valeria Ramírez Castañeda is a Colombian biologist currently pursuing a PhD in the Department of Integrative Biology at the University of California, Berkeley. I completed my undergraduate degree in Biology at the National University of Colombia and earned a master's degree in Ecology and Evolution, as well as another in Science Communication. During her PhD, she is studying the interactions between snakes and frogs and how this influences the evolution of toxin resistance in snakes. She is also collaborating and leading projects regarding the consequences of English in science and the...

Python Web Scraping

October 24, 2024, 2:00pm
In this workshop, we cover how to scrape data from the web using Python. Web scraping involves downloading a webpage's source code and sifting through the material to extract desired data.

Python Web APIs

October 22, 2024, 2:00pm
In this workshop, we cover how to extract data from the web with APIs using Python. APIs are often official services offered by companies and other entities, which allow you to directly query their servers in order to retrieve their data. Platforms like The New York Times, Twitter and Reddit offer APIs to retrieve data.

R Machine Learning with tidymodels: Parts 1-2

October 14, 2024, 1:00pm
Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.

Python Data Wrangling and Manipulation with Pandas

October 10, 2024, 2:00pm
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.

Python Data Wrangling and Manipulation with Pandas

September 27, 2024, 9:00am
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with 'relational' or 'labeled' data both easy and intuitive. It enables doing practical, real world data analysis in Python. In this workshop, we'll work with example data and go through the various steps you might need to prepare data for analysis.