Data Manipulation and Cleaning

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

How can we use big data from iNaturalist to address important questions in Entomology?

February 26, 2024
by Leah Lee. Large-scale geographic data over time on insect diversity can be used to answer important questions in Entomology. Open-source, open-access citizen science platforms like iNaturalist generate huge amounts of data on species diversity and distribution at accelerating rates. However, unstructured citizen science data contain inherent biases and need to be used with care. One of the efforts to validate big data from iNaturalist is to cross-check with systematically collected data, such as museum specimens.

Creating the Ultimate Sweet

January 30, 2024
by Emma Turtelboom. What is the best Halloween candy? In this blog post, we will identify attributes of popular sweets and create a model to understand how these attributes influence the popularity of the sweet. We’ll discuss alternative model approaches and potential drawbacks, as well as caveats to interpreting the predictions of our model.

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.

Tonya D. Lindsey, Ph.D.

Data Science Fellow
Institute of Governmental Studies (IGS)

Tonya D. Lindsey is a visiting scholar at the Institute of Governmental Studies and the project director of CRB Nexus: Where Policy Meets Research, an initiative of the California Research Bureau (CRB) at the California State Library. As project director of CRB Nexus, she is developing a community of practice space for California’s policy staff and public scholars. As a CRB senior researcher she uses her expertise in research methods to analyze a wide variety of policy questions at the request of legislators, the governor’s office, and their staff. She received her PhD in sociology...

Exploratory Data Analysis in Social Science Research

November 14, 2023
by Kamya Yadav. Causal inference has become the dominant endeavor for many political scientists, often at the expense of good research questions and theory building. Returning to descriptive inference – the process of describing the world as it exists – can help formulate research questions worth asking and theory that is grounded in reality. Exploratory data analysis is one method of conducting descriptive inference. It can help social science researchers find empirical patterns and puzzles that motivate their research questions, test correlations between variables, and engage with the existing literature on a topic. In this blog post, I walk through results from exploratory data analysis I conducted for my dissertation project on political ambition of women.

Introduction to Item Response Theory

October 24, 2023
by Mingfeng Xue. Measurements (e.g., tests, surveys, questionnaires) are inevitably involved with various sources of errors. Among many psychometric theories, item response theory stands out for its capability of detailed analyses at the item level and its potential to reduce some of the measurement errors. This post first discussed the limitations of conventional summation and average, which give rise to the IRT models, and then introduced a basic form of the Rasch model, including expressions of the model, the assumptions underlying it, some of its advantages, and software packages. Some codes are also provided.

Using Forest Plots to Report Regression Estimates: A Useful Data Visualization Technique

October 17, 2023
by Sharon Green. Regression models help us understand relationships between two or more variables. In many cases, results are summarized in tables that present coefficients, standard errors, and p-values. Reading these can be a slog. Figures such as forest plots can help us communicate results more effectively and may lead to a better understanding of the data. This blog post is a tutorial on two different approaches to creating high-quality and reproducible forest plots, one using ggplot2 and one using the forestplot package.

James Hall

Consultant
Department of Statistics

James Hall is a graduate student in the Statistics MA program at University of California, Berkeley. He is a husband and father to three awesome kids. Originally from Baltimore, MD, James earned his bachelors in Mathematics at the United States Military Academy at West Point, NY in 2011, and served as a U.S. Army officer. He’s served as a leader at multiple levels within large organizations with a professional focus on visualizing and communicating complex analysis to decision makers. James’ experience and coursework give him expertise in navigating different statistical methods,...

Wadzanai Makomva

Discovery Graduate Fellow
School of Information

Wadzanai is a graduate student at the School of Information and she is a part of the MIMS program. She has a vested interest in the integration between data science, technology and developmental surveillance techniques. She has prior experience working as a quantitative analyst in project management consulting within a professional services firm, public health, and most recently in sustainable construction materials. Wadzanai is particularly interested in increasing access of STEM subjects and fields to under-privileged women of color in the African continent, particularly her home...