Data Science

Twitter Text Analysis: A Friendly Introduction

October 25, 2022

Read part 2 here.

Introduction

Text analysis techniques, including sentiment analysis, topic modeling, and named entity recognition, have been increasingly used to probe patterns in a variety of text-based documents, such as books, social media posts, and others. This blog post introduces Twitter text analysis, but is not intended to cover all of the aforementioned topics. The tutorial is broken down into two parts. In this very first post, I...

Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

December 6, 2022
Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

Different colored arrows mark 1, 2, and 3, pointing in alternating up and down directions.

As D-Lab alumni, we are excited to introduce our pre-print “A...

Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more...

Getting Started with Surveys

October 18, 2022
Getting Started With Surveys

Surveys can be an extremely useful tool for gathering information from individuals and groups. They are used across disciplines and industries to help researchers learn more about populations and gain actionable insights. I’ve done extensive survey research in nonprofit and industry settings, using these data to improve programs, make changes to technology, design communications plans, make content acquisition decisions, and much more. In this blog post, I’ll focus on one of the most important parts of survey research — planning....

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Alex Bruefach

Discovery Graduate Fellow
Materials Science and Engineering

Alex is a PhD Candidate in materials science and engineering developing image processing and machine learning techniques for extracting information from electron microscopy datasets. Her primary focus is understanding what information is transferred from various feature representations of images. She has extensive experience collaborating across boundaries and is passionate about brainstorming innovative approaches to challenging data science problems!

Ella Belfer

Consultant
Energy and Resources Group

Ella is a PhD student in the Energy and Resources Group. Her research examines water governance in a changing climate, drawing on geo-spatial techniques. Her past work includes applications of topic modelling in climate change adaptation research, and inductive coding of semi-structured interviews.

Shusheng Li

UTech Management
Data Science
Economics

Shusheng is currently a fourth-year undergraduate student studying Data Science and Economics. He is currently a part of the UTech Management team at D-Lab. Shusheng loves playing all types of sports because it's a great way to stay fit and be together with friends. Working as a UTech Front desk, Shusheng loves helping others and directing them to the right resources available.

Josh Everts

School of Information

I'm a Master's student at the Berkeley School of Information in the MIMS program, studying Data Science. I am especially interested in applying the statistical and computational methods of Data Science to problems within the natural sciences and transportation. To this end I am currently helping with the data analysis of a spectroscopy experiment at SLAC National Lab. Outside of academic work I enjoy improving my cooking skills, biking, and learning about history and geography.