Text Analysis

Racism Narratives in Medical Literature

Systemic racism is a driving factor in unequal health outcomes, but it is rarely the subject of study in top medical journals (see a 2021 analysis by Krieger et al.). This project, a collaboration between the UC Berkeley D-Lab and the American Medical Association's Center for Health Equity, aims to measure progress in acknowledging, studying, & dismantling racism by creating tools to track racism-related narratives in influential medical research.

Erin Manalo-Pedro

Research Fellow
Community Health Sciences (UCLA)

Erin Manalo-Pedro is a Ph.D. student in the Department of Community Health Sciences at the UCLA Fielding School of Public Health with a minor in education. She focuses her racial health equity research on curriculum, the health workforce, and political interventions for communities of color. Drawing from Public Health Critical Race Praxis and Pinayism, she aims to use methods, like natural language processing and counter storytelling, to document the subtleties of structural racism and resistance from marginalized groups.

To guide her interdisciplinary approach, Erin leverages
...

Katherine Wolf

Adjunct Fellow
Environmental Science, Policy, and Management

Doctoral student in Rachel Morello-Frosch's laboratory in the Department of Environmental Science, Policy, and Management working at the intersection of environmental epidemiology, environmental justice, and causal inference. Particularly interested in developing quantitative methods to investigate the operation of social power in environmental monitoring regimes in the United States.

Spencer Le

Data Peer Consultant, UTech
Computer Science
Data Science

I am a senior majoring in Computer Science and minoring in Data Science. I love crunching down big data and analyzing it in order to help solve real-life issues. In my free time, I like jamming out to music, drawing, studying history, and posting on my foodstagram. If you have any questions regarding Computer Science or Data Science, please stop by!

Twitter data extraction with Selenium

March 1, 2022

Introduction

With online communities and social networks serving as important sites for computational social science research, Twitter has quickly become a popular data source for researchers (Frey et al. (2020), Kusen et al. (2017), Rao et al. (2010) and Ru et al. (2021)). This blog post will demonstrate one way to extract twitter data without using the Twitter API. This is especially useful for researchers who are new to exploring the use of Twitter data in their research, looking to develop a baseline corpus for a research question they are newly...

PoliPy: A Python Library for Scraping and Analyzing Privacy Policies

February 8, 2022

In light of recent scandals involving the misuse and improper handling of personal data by large corporations, advocacy groups and regulators alike have given increased attention to the issue of consumer privacy [e.g., 1, 2, 3, 4, 5]. National and local governments have been enacting privacy legislation that requires companies to minimize the amount of data they collect, deters the collection of sensitive data, limits the purposes for which the data are used, and critically, gives users more transparency into data collection and use.

As part...

Jennifer Kaplan

Consultant
French

Jennifer is a first-year graduate student in the Romance Languages & Literatures program here at Berkeley. She has experience conducting ethnographic fieldwork and is passionate about qualitative research methods.

Text Analysis for Public Health

October 5, 2021
October 5th, 2021 - another day in the global pandemic. Average Joes are busy tweeting about it, politicians give interviews on the latest plans, and newspapers publish article after article on vaccination levels, case counts, and the booster shot. That’s a ton of information. So much in fact, that it would be pretty nice to have some computer assisted help to sort through it. Enter stage right: text analysis. Just what is it, and in the midst of COVID-19, how can it be used to advance public health? Text analysis is a family of analytic techniques used to identify patterns and meaning from unstructured text, that is, text that a computer can’t readily understand. Aka, most qualitative data. And there is a lot of that sort of data floating around. We’re talking tweets, Reddit posts, and emails, but also electronic health records (EHRs), books, and even academic research. You’ll probably agree that in that list alone, there’s a lot of valuable data!

Ilya Akdemir

Data Science Fellow
School of Law

Ilya is a JSD candidate at UC Berkeley School of Law. His research focuses on natural language processing and machine learning applications that are motivated by both theoretical and practical questions in the legal domain.