Data Science

Exploring Population Data with IPUMS

November 8, 2022
Exploring Population Data with IPUMS

Last month, demographer and historian Steve Ruggles was awarded a prestigious MacArthur Foundation Fellowship for his work developing IPUMS—a harmonized database of individual and family responses to large-scale domestic and international surveys. With some samples going as far back as the 18th century, IPUMS can offer key insights into changing demographics, norms, and decision-making over...

Twitter Text Analysis: A Friendly Introduction

October 25, 2022

Read part 2 here.

Introduction

Text analysis techniques, including sentiment analysis, topic modeling, and named entity recognition, have been increasingly used to probe patterns in a variety of text-based documents, such as books, social media posts, and others. This blog post introduces Twitter text analysis, but is not intended to cover all of the aforementioned topics. The tutorial is broken down into two parts. In this very first post, I...

Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

December 6, 2022
Introducing “A Three-Step Guide to Training Computational Social Science Ph.D. Students for Academic and Non-Academic Careers”

Different colored arrows mark 1, 2, and 3, pointing in alternating up and down directions.

As D-Lab alumni, we are excited to introduce our pre-print “A...

Disaggregating Race and Ethnicity Categories in Census Data

November 1, 2022

The collection of race and ethnicity data by the United States Census Bureau has a long, complex, and problematic history. The Census claims that their racial categories generally reflect a social definition of race recognized in America, adhering to guidelines set by the U.S. Office of Management and Budget. In 1900, the Census recognized five racial categories: White, Black, Chinese, Japanese, and American Indian. Today, the Census collects more...

Getting Started with Surveys

October 18, 2022
Getting Started With Surveys

Surveys can be an extremely useful tool for gathering information from individuals and groups. They are used across disciplines and industries to help researchers learn more about populations and gain actionable insights. I’ve done extensive survey research in nonprofit and industry settings, using these data to improve programs, make changes to technology, design communications plans, make content acquisition decisions, and much more. In this blog post, I’ll focus on one of the most important parts of survey research — planning....

Reduce, Reuse, Recycle: Practical strategies for working with large datasets

October 12, 2022

When the size of your datasets start to approach the size of your computer’s available memory, even the simplest data wrangling tasks can become frustrating. Suddenly, reading in a .csv or calculating a simple average becomes time-consuming or impossible. As students or researchers, accessing additional computing resources can be costly or is not always an available option. Here are some principles and strategies for reducing the overhead of your dataset while keeping the momentum going. The code mainly focuses on reading csv files - a very common data format - into Python...

Bo Yun Park, Ph.D.

Postdoc
D-Lab

I am a Postdoctoral Scholar in the D-Lab at the University of California, Berkeley. My research lies at the intersection of political, cultural, and transnational sociology. I am particularly interested in dynamics of social inclusion and exclusion, social change, technology, and digital politics. My dissertation investigated how political strategists in France and the United States craft narratives of political leadership for presidential candidates in the digital age. I received my Ph.D. in Sociology at Harvard University, where I was affiliated with the Institute for Quantitative Social...

Yiyi He

Consultant
Landscape Architecture & Environmental Planning

Yiyi He is a Ph.D. candidate from the College of Environmental Design at University of California, Berkeley. She received her bachelor’s degree in City and Regional Planning from Nanjing University and her master’s degree in Environmental Planning from UC Berkeley. She is currently working as an AI Resident at GoogleX. Prior to this, she worked as a consultant for the Global Facility for Disaster Reduction and Recovery at the World Bank and a researcher for the Center for Catastrophic Risk Management and Federal Aviation Administration Consortium in Aviation Operations Research. Her...

Ella Belfer

Consultant
Energy and Resources Group

Ella is a PhD student in the Energy and Resources Group. Her research examines water governance in a changing climate, drawing on geo-spatial techniques. Her past work includes applications of topic modelling in climate change adaptation research, and inductive coding of semi-structured interviews.

Alex Bruefach

Discovery Graduate Fellow
Materials Science and Engineering

Alex is a PhD Candidate in materials science and engineering developing image processing and machine learning techniques for extracting information from electron microscopy datasets. Her primary focus is understanding what information is transferred from various feature representations of images. She has extensive experience collaborating across boundaries and is passionate about brainstorming innovative approaches to challenging data science problems!