Data Science

Democratizing Our Data

August 26, 2021, 10:00am
There is enormous interest in building a better understanding of how evidence and data can inform policy. New possibilities have opened up to enable data to be shared and used across states and agencies. One is a technical approach – the Administrative Data Research Facility – which provides a secure environment within which education, training, and workforce data can be shared across agencies and states. The other is human – the Applied Data Analytics training program – which trains government agency staff how to combine and use the data to serve their agency missions. Over 650 participants from over 150 agencies have participated and produced new products and new networks in the process. This presentation discusses the approach sponsored by the California Department of Social Services, joint with the Department of Education and the Economic Development Department. The D-Lab worked with the Coleridge Initiative to successfully combine the two approaches. The presentation will also address the broader vision of how approaches like this can serve to democratize data for the United States.
See event details for participation information.

The Importance of Design Plans for Data Science

April 20, 2021

Since becoming a Data Fellow at the D-Lab, I have had the opportunity to assist many talented social scientists through the D-Lab’s Consulting service. A regular consulting request is to help with the research design for a new project. These requests are understandable. For empirical researchers, a high-quality research design makes or breaks a research project. In this post, I suggest a few benefits of writing a skeleton design plan before writing any code whatsoever.

One of the exciting aspects...

Handling Missing Data

May 4, 2021

I recently started working with a set of eviction data for a project on housing precarity at the Urban Displacement Project. As I began exploring the dataset, I was excited to find that it appeared to contain a wealth of historical data we could use to train a robust model for predicting eviction rates in urban neighborhoods. However, my initial excitement soon had to be scaled back when a standard check for missing data revealed that many of the observations lacked values for precisely the variable we aimed to predict. I was now faced with the problem of what to do about this...