Exploratory Data Analysis in Social Science Research

November 14, 2023
by Kamya Yadav. Causal inference has become the dominant endeavor for many political scientists, often at the expense of good research questions and theory building. Returning to descriptive inference – the process of describing the world as it exists – can help formulate research questions worth asking and theory that is grounded in reality. Exploratory data analysis is one method of conducting descriptive inference. It can help social science researchers find empirical patterns and puzzles that motivate their research questions, test correlations between variables, and engage with the existing literature on a topic. In this blog post, I walk through results from exploratory data analysis I conducted for my dissertation project on political ambition of women.

Using Forest Plots to Report Regression Estimates: A Useful Data Visualization Technique

October 17, 2023
by Sharon Green. Regression models help us understand relationships between two or more variables. In many cases, results are summarized in tables that present coefficients, standard errors, and p-values. Reading these can be a slog. Figures such as forest plots can help us communicate results more effectively and may lead to a better understanding of the data. This blog post is a tutorial on two different approaches to creating high-quality and reproducible forest plots, one using ggplot2 and one using the forestplot package.

FSRDC 2023 Annual Meeting and Research Conference

October 2, 2023
by Renee Starowicz. Renee Starowicz, Co-Executive Director of the Berkeley Federal Statistical Research Data Center, provides an overview of the takeaways from the 2023 Annual Federal Statistical Research Data Center Business Meeting and Annual Conference. She provides a brief overview of the Berkeley FSRDC. Then, she describes the priorities for collaboration across national directors to improve outreach to diverse researchers and transparency. Additionally, she points out the other key topics of conversation at this year’s meeting.

Can Machine Learning Models Predict Reality TV Winners? The Case of Survivor

March 14, 2023
by Kelly Quinn. Reality television shows are notorious for tipping the scales to favor certain players they want to see win, but could producers also be spoiling the results in the process? Drawing on data about Survivor, I attempt to predict the likelihood of a contestant making it far into the game based on editing and production decisions, as well as demographic information. This post describes the model used to classify player outcomes and other potential ways to leverage data about reality TV shows for prediction.

Introduction to Field Experiments and Randomized Controlled Trials

July 24, 2023
by Leena Bhai. This blog post provides an introduction to field experimentation and its significance in understanding cause and effect. It explains how randomized experiments represent an unbiased method for determining what works. It delves into essential features of experiments such as intervention, excludability, and non-interference. It then works through a fictional example of a randomized controlled trial of the efficacy of an experimental drug Covi-Mapp.

Finley Golightly

D-Lab Staff
Applied Mathematics

Finley joined D-Lab as full-time staff launching their career in Data Science after graduating with a Bachelor's degree in Applied Math from UC Berkeley.

They have been with D-Lab since Fall 2020 and formerly as part of the UTech Management team before joining as full-time staff. They love the learning environment of D-Lab and their favorite part of the job is their co-workers! In their free time, they enjoy reading, boxing, listening to music, and playing Dungeons & Dragons. Feel free to stop by the front desk to ask them any questions or just to chat...

Christopher Paciorek, Ph.D.

Research Computing Consultant, Adjunct Professor
Department of Statistics
Research IT

Chris Paciorek is an adjunct professor in the Department of Statistics, as well as the Statistical Computing Consultant in the Department's Statistical Computing Facility (SCF) and in the Econometrics Laboratory (EML) of the Economics Department. He is also a user support consultant for Berkeley Research Computing. He teaches and presents workshops on statistical computing topics, with a focus on R.

A brief primer on Hidden Markov Models

April 25, 2022

For many data science problems, there is a need to estimate unknown information from a sequence of observed events. You may want to know, for instance, whether a person is angry or happy, given a sequence of brain scans taken while playing a video game. Or you may be digitizing an ancient text, but, due to water damage, can’t tell what one word in the sequence says. Or in my case (I’m a wildlife biologist), you may want to infer whether an animal is sleeping or eating at any given moment using a sequence of animal GPS locations.

Now, there are...

Explaining the 80-20 Rule with the Pareto Distribution

March 15, 2022

Introduction to Pareto

While not as well-known as the bell-shaped Normal (Gaussian) distribution, the Pareto distribution is a powerful tool for modeling a variety of real-life phenomena. It is named after the Italian economist Vilfredo Pareto (1848-1923), who developed the distribution in the 1890s as a way to describe the allocation of wealth in society. He famously observed that 80% of society’s wealth was controlled by 20% of its population, a concept now known as the “Pareto Principle” or the “80-20 Rule”.


Michael Sholinbeck

Public Health Librarian
Bioscience, Natural Resources & Public Health Library

Michael has worked at the UC Berkeley Library since 2001, and is currently the Public Health Librarian and Liaison to the School of Optometry at the Bioscience, Natural Resources & Public Health Library. Michael coordinates public health instruction at the library, and is responsible for the public health collection. Michael has a MLIS from San Jose State University, an MS in Geography from Oregon State University, and a BA in Geography from UC Berkeley. When not at work he lives out his fantasy of being a rock and roll drummer.