Visualization

Searching for Other Solar Systems

November 21, 2023
by Emma Turtelboom. Over the last three decades, we have discovered over 5000 exoplanets, which are planets outside of our Solar System. With these observations, we can try to answer many questions we have about the universe. For example, how unique is the Solar System? How do planets form? Is there life elsewhere in the Milky Way? We can query the NASA Exoplanet Archive to compare multi-planet systems to the Solar System. Through this, we can compare how similar (or dissimilar!) the systems are.

Exploratory Data Analysis in Social Science Research

November 14, 2023
by Kamya Yadav. Causal inference has become the dominant endeavor for many political scientists, often at the expense of good research questions and theory building. Returning to descriptive inference – the process of describing the world as it exists – can help formulate research questions worth asking and theory that is grounded in reality. Exploratory data analysis is one method of conducting descriptive inference. It can help social science researchers find empirical patterns and puzzles that motivate their research questions, test correlations between variables, and engage with the existing literature on a topic. In this blog post, I walk through results from exploratory data analysis I conducted for my dissertation project on political ambition of women.

Mapping Census Data with tidycensus

November 6, 2023
by Alex Ramiller. The U.S. Census Bureau provides a rich source of publicly available data for a wide variety of research applications. However, the traditional process of downloading these data from the census website is slow, cumbersome, and inefficient. The R package “tidycensus” provides researchers with a tool to overcome these challenges, enabling a streamlined process to quickly downloading numerous datasets directly from the census API (Application Programming Interface). This blog post provides a basic workflow for the use of the tidycensus package, from installing the package and identifying variables to efficiently downloading and mapping census data.

Hate Speech

The hate speech measurement project began in early 2017 at UC Berkeley’s D-Lab. Our research project applies data science techniques such as machine learning to track changes in hate speech over time and across social media platforms. After three years, we have now published our groundbreaking method that measures hate speech with precision while mitigating the influence of human bias. Read the manuscript here.

Python Fundamentals: Parts 1-3

December 4, 2023, 10:00am
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Introduction to Item Response Theory

October 24, 2023
by Mingfeng Xue. Measurements (e.g., tests, surveys, questionnaires) are inevitably involved with various sources of errors. Among many psychometric theories, item response theory stands out for its capability of detailed analyses at the item level and its potential to reduce some of the measurement errors. This post first discussed the limitations of conventional summation and average, which give rise to the IRT models, and then introduced a basic form of the Rasch model, including expressions of the model, the assumptions underlying it, some of its advantages, and software packages. Some codes are also provided.

Using Forest Plots to Report Regression Estimates: A Useful Data Visualization Technique

October 17, 2023
by Sharon Green. Regression models help us understand relationships between two or more variables. In many cases, results are summarized in tables that present coefficients, standard errors, and p-values. Reading these can be a slog. Figures such as forest plots can help us communicate results more effectively and may lead to a better understanding of the data. This blog post is a tutorial on two different approaches to creating high-quality and reproducible forest plots, one using ggplot2 and one using the forestplot package.

Python Intermediate: Parts 1-3

October 9, 2023, 1:00pm
This three-part interactive workshop series teaches you intermediate programming Python for people with previous programming experience equivalent to our Python Fundamentals workshop. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Stata Fundamentals: Parts 1-3

October 10, 2023, 2:00pm
This workshop is a three-part introductory series that will teach you Stata from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the Stata software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Python Fundamentals: Parts 1-3

October 24, 2023, 2:00pm
This three-part interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.