Systemic racism is a driving factor in unequal health outcomes, but it is rarely the subject of study in top medical journals (see a 2021 analysis by Krieger et al.). This project, a collaboration between the UC Berkeley D-Lab and the American Medical Association's Center for Health Equity, aims to measure progress in acknowledging, studying, & dismantling racism by creating tools to track racism-related narratives in influential medical research.
We employ a diverse set of qualitative and quantitative methods to conceptualize and measure these racism narratives. Our work began with extensive theorization of racism narratives, leading to a hierarchical categorization of narratives into broad categories and fine-grained subtypes. We then used machine learning to measure those narratives, beginning with an unsupervised strategy that combines topic modeling (seeded and contextual) with custom word embeddings and multiword expression extraction.
Our team is now developing a labeling instrument and large-scale annotation procedure to transition towards a supervised learning-based measurement approach integrated with item response theory.