Data Science

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Alex Bruefach

Discovery Graduate Fellow
Materials Science and Engineering

Alex is a PhD Candidate in materials science and engineering developing image processing and machine learning techniques for extracting information from electron microscopy datasets. Her primary focus is understanding what information is transferred from various feature representations of images. She has extensive experience collaborating across boundaries and is passionate about brainstorming innovative approaches to challenging data science problems!

Ella Belfer

Consultant
Energy and Resources Group

Ella is a PhD student in the Energy and Resources Group. Her research examines water governance in a changing climate, drawing on geo-spatial techniques. Her past work includes applications of topic modelling in climate change adaptation research, and inductive coding of semi-structured interviews.

Shusheng Li

UTech Management
Data Science
Economics

Shusheng is currently a fourth-year undergraduate student studying Data Science and Economics. He is currently a part of the UTech Management team at D-Lab. Shusheng loves playing all types of sports because it's a great way to stay fit and be together with friends. Working as a UTech Front desk, Shusheng loves helping others and directing them to the right resources available.

Josh Everts

School of Information

I'm a Master's student at the Berkeley School of Information in the MIMS program, studying Data Science. I am especially interested in applying the statistical and computational methods of Data Science to problems within the natural sciences and transportation. To this end I am currently helping with the data analysis of a spectroscopy experiment at SLAC National Lab. Outside of academic work I enjoy improving my cooking skills, biking, and learning about history and geography.

Bobo Kwok

UTech
Data Science

I am an undergraduate student studying Data Science with an emphasis in Applied Mathematics & Modeling. I enjoy storytelling through data visuals and learning new visualization tools.

What is MLOps? An Introduction to the World of Machine Learning Operations

May 10, 2022
More than ever, AI and machine learning (ML) are integral parts of our lives and are tightly coupled with the majority of the products we use on a daily basis. We use AI/ML in almost everything we can think of, from advertising to social media and just going about our daily lives! With the prevalent use of these tools and models, it is essential that, as IT systems and software became a disciplined practice in terms of development, maintainability, and reliability in the early 2000s, ML systems follow a similar trend. The field focused on developing such practices is currently loosely defined under many different titles (e.g., machine learning engineering, applied data science), but is most commonly known as MLOps, or Machine Learning Operations.

Scrollytelling through a look at food prices around the world

May 2, 2022

You have gathered the needed data to support your research, check. You have made some hypotheses about what you hope to conclude, check. You have spent time cleaning the data and organizing it in a manner that permits further exploration, check. You have sliced and diced the data with your favorite data exploration software packages or techniques and created some data visualizations that you feel confident about, quadruple check! You are now armed with insights that you hope to showcase to the world, what’s next? In this article, I would like to share some tips for creating a...

A brief primer on Hidden Markov Models

April 25, 2022

For many data science problems, there is a need to estimate unknown information from a sequence of observed events. You may want to know, for instance, whether a person is angry or happy, given a sequence of brain scans taken while playing a video game. Or you may be digitizing an ancient text, but, due to water damage, can’t tell what one word in the sequence says. Or in my case (I’m a wildlife biologist), you may want to infer whether an animal is sleeping or eating at any given moment using a sequence of animal GPS locations.

Now, there are...