Software Tools

R Geospatial Fundamentals: Vector Data, Parts 1-2

March 7, 2022, 10:00am
Geospatial data are an important component of data visualization and analysis in the social sciences, humanities, and elsewhere. The R programming language is a great platform for exploring these data and integrating them into your research. This workshop focuses on fundamental operations for reading, writing, manipulating and mapping vector data, which encodes location as points, lines and polygons.

R Machine Learning with tidymodels: Parts 1-2

February 27, 2024, 10:00am
Machine learning often evokes images of Skynet, self-driving cars, and computerized homes. However, these ideas are less science fiction as they are tangible phenomena that are predicated on description, classification, prediction, and pattern recognition in data. During this two part workshop, we will discuss basic features of supervised machine learning algorithms including k-nearest neighbor, linear regression, decision tree, random forest, boosting, and ensembling using the tidymodels framework. To social scientists, such methods might be critical for investigating evolutionary relationships, global health patterns, voter turnout in local elections, or individual psychological diagnoses.

R Data Wrangling and Manipulation: Parts 1-2

September 12, 2022, 2:00pm
It is said that 80% of data analysis is spent on the process of cleaning and preparing the data for exploration, visualization, and analysis. This R workshop will introduce the dplyr and tidyr packages to make data wrangling and manipulation easier. Participants will learn how to use these packages to subset and reshape data sets, do calculations across groups of data, clean data, and other useful tasks.

Propensity Score Matching for Causal Inference: Creating Data Visualizations to Assess Covariate Balance in R

June 10, 2024
by Sharon Green. Although some people consider randomized experiments the gold standard, in many cases, it would be highly unethical to assign individuals to harmful exposures to measure their effects. Modern causal inference techniques help scientists to estimate treatment effects using observational data. In particular, propensity score matching helps scientists estimate causal effects using observational data by matching individuals so that the “treatment” and “control” groups are balanced on measured covariates. After implementing propensity score matching, data visualizations make it easier to assess the quality of the matches before estimating effects. This blog post is a tutorial for implementing propensity score matching and creating data visualizations to assess covariate balance–that is, visually assessing whether the matched individuals are balanced with respect to measured covariates.

Hugh Kadhem

Mathematics

Hugh Kadhem is a Ph.D. student in Applied Mathematics, with broad research interests in computational quantum physics and high-performance scientific computing.

Git for Research Transparency and Reproducibility Training (RT2)

June 6, 2024, 3:15pm
This is a custom Git workshop for the 2024 Research Transparency and Reproducibility Training (RT2).

Conceptual Mirrors: Reflecting on LLMs' Interpretations of Ideas

April 23, 2024
by María Martín López. As large language models begin to engrain themselves in our daily lives we must leverage cognitive psychology to explore the understanding that these algorithms have of our world and the people they interact with. LLMs give us new insights into how conceptual representations are formed given the limitations of data modalities they have access to. Is language enough for these models to conceptualize the world? If so, what conceptualizations do they have of us?

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Introduction to Propensity Score Matching with MatchIt

April 1, 2024
by Alex Ramiller. When working with observational (i.e. non-experimental) data, it is often challenging to establish the existence of causal relationships between interventions and outcomes. Propensity Score Matching (PSM) provides a powerful tool for causal inference with observational data, enabling the creation of comparable groups that allow us to directly measure the impact of an intervention. This blog post introduces MatchIt – a software package that provides all of the necessary tools for conducting Propensity Score Matching in R – and provides step-by-step instructions on how to conduct and evaluate matches.

Tracking Urban Expansion Through Satellite Imagery

December 12, 2023
by Leïla Njee Bugha. Among its many uses, remote sensing can prove especially useful to document changes and trends from eras or settings, where traditional sources are either inexistent or infrequently collected. This is the case when one wants to study urban expansion in sub-Saharan countries over the past 20 years. To further remedy the lack of data on land cover uses from earlier time periods, classification methods can be used as well. Using easily accessible satellite imagery from Google Earth Engine, I provide here an example combining remote sensing with classification to detect changes in the land cover in Nigeria since 2000 due to urban expansion.