Data Science

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Yiyi He

Consultant
Landscape Architecture & Environmental Planning

Yiyi He is a Ph.D. candidate from the College of Environmental Design at University of California, Berkeley. She received her bachelor’s degree in City and Regional Planning from Nanjing University and her master’s degree in Environmental Planning from UC Berkeley. She is currently working as an AI Resident at GoogleX. Prior to this, she worked as a consultant for the Global Facility for Disaster Reduction and Recovery at the World Bank and a researcher for the Center for Catastrophic Risk Management and Federal Aviation Administration Consortium in Aviation Operations Research. Her...

Ella Belfer

Consultant
Energy and Resources Group

Ella is a PhD student in the Energy and Resources Group. Her research examines water governance in a changing climate, drawing on geo-spatial techniques. Her past work includes applications of topic modelling in climate change adaptation research, and inductive coding of semi-structured interviews.

Alex Bruefach

Discovery Graduate Fellow
Materials Science and Engineering

Alex is a PhD Candidate in materials science and engineering developing image processing and machine learning techniques for extracting information from electron microscopy datasets. Her primary focus is understanding what information is transferred from various feature representations of images. She has extensive experience collaborating across boundaries and is passionate about brainstorming innovative approaches to challenging data science problems!

Shusheng Li

UTech Management
Data Science
Economics

Shusheng is currently a fourth-year undergraduate student studying Data Science and Economics. He is currently a part of the UTech Management team at D-Lab. Shusheng loves playing all types of sports because it's a great way to stay fit and be together with friends. Working as a UTech Front desk, Shusheng loves helping others and directing them to the right resources available.

Frances Leung

Data Science Fellow
School of Information

Frances Leung is a master’s student at UC Berkeley School of Information where she focuses her studies in information and data science. She has a keen interest in leveraging data-driven insights to better understand consumer behaviors and the world around us. In her professional work as a management consultant, she advises retailers and consumer businesses on digital transformation and creating web/mobile experiences that delight consumers through a human-centered approach. Frances holds a Master in Business Administration from York University, Schulich School...

Josh Everts

School of Information

I'm a Master's student at the Berkeley School of Information in the MIMS program, studying Data Science. I am especially interested in applying the statistical and computational methods of Data Science to problems within the natural sciences and transportation. To this end I am currently helping with the data analysis of a spectroscopy experiment at SLAC National Lab. Outside of academic work I enjoy improving my cooking skills, biking, and learning about history and geography.

Bobo Kwok

UTech
Data Science

I am an undergraduate student studying Data Science with an emphasis in Applied Mathematics & Modeling. I enjoy storytelling through data visuals and learning new visualization tools.

What is MLOps? An Introduction to the World of Machine Learning Operations

May 10, 2022
More than ever, AI and machine learning (ML) are integral parts of our lives and are tightly coupled with the majority of the products we use on a daily basis. We use AI/ML in almost everything we can think of, from advertising to social media and just going about our daily lives! With the prevalent use of these tools and models, it is essential that, as IT systems and software became a disciplined practice in terms of development, maintainability, and reliability in the early 2000s, ML systems follow a similar trend. The field focused on developing such practices is currently loosely defined under many different titles (e.g., machine learning engineering, applied data science), but is most commonly known as MLOps, or Machine Learning Operations.