R

Design Your Observational Study with the Joint Variable Importance Plot

March 12, 2024
by Lauren Liao. When evaluating causal inference in observational studies, there often is a natural imbalance in the data. Luckily, variables are often measured alongside that can be helpful for adjustment. However, deciding which variables should be prioritized for adjustment is not trivial – since not all variables are equally important to the intervention or the outcome. I recommend using the joint variable importance plot during the observational study design phase to visualize which variables should be prioritized. This post provides a gentle guide on how to do so and why it is important.

A Basic Introduction to Hierarchical Linear Modeling

March 4, 2024
by Mingfeng Xue. Hierarchical Linear Modeling (HLM) is an extension of linear models, which offers an approach to analyzing data structures with nested levels. This blog elucidates HLM's significance over traditional linear regression models, particularly in handling clustered data and multilevel predictors. Illustrated with an example from educational research, the blog demonstrates model implementation and interpretation steps. It showcases how HLM accommodates both independent variables from different levels and hierarchical structure data, providing insights into their impacts on the outcome variable. Recommended resources further aid readers in mastering HLM techniques.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

Reine Ngnonsse

IUSE Undergraduate Advisory Board
Genetics and Plant Biology

Reine Ngnonsse, an enthusiast for math and technology, delved into tutoring math at a community college through the EOPs program. At UC Berkeley, while pursuing Genetics and Plant Biology, She explored R programming in a CRISPR project. As an intern at Health Career Connection, Reine expanded coding skills in Python, R, and Tableau, igniting a passion for programming. With exposure to Python and Javascript, she can't wait to merge mathematical prowess with coding finesse for innovative solutions.

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.

Larissa Benjamin

Doctor of Public Health Student
Public Health

Larissa Benjamin is a second year Doctor of Public Health (DrPH) student at UC Berkeley. Her research uses a mixed-methods approach to exploring the structural determinants of cardiovascular disease inequities in the rural Southeastern United States, also called the “Stroke Belt.” She is particularly curious about how regional history, geography, and structural racism shape inequitable rural neighborhood risk environments. Larissa earned a BS in Evolutionary Anthropology and English from University of Michigan, and an MPH at UC Berkeley in Health and Social Behavior with a...

Mapping Census Data with tidycensus

November 6, 2023
by Alex Ramiller. The U.S. Census Bureau provides a rich source of publicly available data for a wide variety of research applications. However, the traditional process of downloading these data from the census website is slow, cumbersome, and inefficient. The R package “tidycensus” provides researchers with a tool to overcome these challenges, enabling a streamlined process to quickly downloading numerous datasets directly from the census API (Application Programming Interface). This blog post provides a basic workflow for the use of the tidycensus package, from installing the package and identifying variables to efficiently downloading and mapping census data.

Introduction to Item Response Theory

October 24, 2023
by Mingfeng Xue. Measurements (e.g., tests, surveys, questionnaires) are inevitably involved with various sources of errors. Among many psychometric theories, item response theory stands out for its capability of detailed analyses at the item level and its potential to reduce some of the measurement errors. This post first discussed the limitations of conventional summation and average, which give rise to the IRT models, and then introduced a basic form of the Rasch model, including expressions of the model, the assumptions underlying it, some of its advantages, and software packages. Some codes are also provided.

María Martín López

Data Science Fellow
Psychology

María Martín López is a PhD student in the Cognition area within the Department of Psychology. Her research relates to cognitive computational and quantitative models of individual differences in behaviors, thoughts, and emotions. She is particularly interested in how we can create and leverage novel algorithms to understand, measure, and predict processes relating to externalizing psychopathology (e.g. impulsivity, aggression, substance use). She answers these questions using a range of computational and quantitive models including AI, NLP, SEM, time series analysis, multi-level...

Using Forest Plots to Report Regression Estimates: A Useful Data Visualization Technique

October 17, 2023
by Sharon Green. Regression models help us understand relationships between two or more variables. In many cases, results are summarized in tables that present coefficients, standard errors, and p-values. Reading these can be a slog. Figures such as forest plots can help us communicate results more effectively and may lead to a better understanding of the data. This blog post is a tutorial on two different approaches to creating high-quality and reproducible forest plots, one using ggplot2 and one using the forestplot package.