My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

This summer, I joined the Data Science for Social Justice workshop at UC Berkeley. The workshop, hosted by D-Lab and sponsored by Berkeley Graduate Division, provided a great dive into Python and natural language processing – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. In this post, I share about the project my team conducted exploring abortion, and then discuss tensions that I am feeling with computational social science, before ending with recommendations.

My Team Project: Abortion needs before/after Roe v. Wade was overturned

I joined a team [1] examining a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe v. Wade (RvW) was overturned on June 24, 2022. Reddit is a social media platform and online community that includes subreddits, which are focused on a particular topic or theme. The Abortion subreddit (/r/abortion) is described as a “space to seek support and resources and to share abortion stories.” It’s defined as pro-abortion where users who create posts expressing anti-abortion views or suggesting adoption are given bans from the subreddit. It has over 41,000 members with about 43,000 submissions and over 100,000 comments. It’s an active community that dates back to 2010 with new posts everyday.

Our team had two research questions (RQs):

What types of support are people requesting before and after the overturning of RvW?
Are there changes in the number and content of posts discussing medical vs. surgical abortion before vs. after overturning of RvW?

To answer our research questions, we restricted our analysis to posts made in the 6 months before and 6 months after the overturn of Roe v. Wade in June 2022 and focused on the US experience.

What types of support are people requesting before and after the overturning of RvW?

For RQ 1, our motivation was grounded in a desire to understand how the overturning of RvW has affected people’s needs when considering getting an abortion. Are people turning to Reddit more and what kinds of support are they asking for (e.g., financial, travel, emotional)? To answer the question, we employed two computational approaches: (1) Topic modeling to identify main topics discussed in the Reddit channel and understand topics related to the types of support people seek, and (2) TF-IDF to identify relevant posts for further quantitative and qualitative review. We visualized our results using word clouds, bar charts, and maps.

I led the computational approach for this research question. Overall, the topic modeling did not reveal major differences from before vs. after RvW was overturned. Topics tend to cover areas of:

help requests and pills,

feelings regarding whether to get an abortion,

experiences of abortion,

questions on procedures (logistical and physical),

pills and physical reactions,

access to prescriptions,

and surgical options.

An interesting result, however, was the word ‘help’ appearing more often in different topics. Upon further analysis, we found that there was a 26% increase in people asking for help after the overturning of RvW. Specifically, there was a 26% increase in people seeking help related to abortion pills and 47% increase in people seeking advice. In addition to illustrating increased demand for pills, these findings indicate that people sought help or advice more often after the overturning of RvW. Qualitative reviews of the data also illustrate topics post-RvW leaning towards questions related to finding clinics, travel, and seeking financial support for travel. Posts prior to RvW being overturned also discussed challenges around finding pills and doctors to support in abortions.

Given the word ‘travel’ showed up when digging deeper into the word ‘help’ in our data, we examined its frequency. We found a 100% increase in discussions of travel (Pre RvW (n=96); Post RvW (n=198)) and nearly 300% increase in the word “state” within the context of travel (n = 21, n = 62 respectively), indicating greater questions or discussions around what states to travel to for abortion access.

Are there changes in the number and content of posts discussing medical vs. surgical abortion before vs. after overturning of RvW?

RQ 2 explored changes in the number and content of posts discussing medical vs. surgical abortion before vs. after overturning of Roe v. Wade. For this, we used t-SNE word embeddings to look at clusters related to “medical” and “surgical” and then VADER to conduct sentiment analysis comparing sentiment for posts related to medication and surgical abortion before vs. after overturning of Roe v. Wade. Overall, we found increases in all posts (+32% in the 6 months after RvW overturned), with particular increases in medication related posts (+41%) and a smaller increase in surgical posts (+32%). In regards to sentiment, the average sentiment is slightly more positive after the overturn of Roe v. Wade for those discussing medication abortion; while the average sentiment is slightly more negative after the overturn of Roe v. Wade for those discussing surgical abortion. These results match with findings from RQ1 of increased attention and interest in medication abortion.

Figure 1. Sentiment analysis before vs. after RvW overturned

Tensions with computational social science

I continue to grapple with several challenges and issues in using computational methods for social science, particularly as it relates to employing machine learning and large language models in international research. Namely: Computational social science (CSS) currently relies on language models that have a Western basis to them (e.g., sentiment analysis tools -- and data these models learn from are often "standard" American English / western). Given that I plan to use CSS in my own PhD research with big data not in "standard" American English, this is something I will grapple with. In particular, there may be lower accuracy for some of the sentiment analysis and topic modeling methods I employ. I will be transparent about these limitations in the CSS methods, while also reflecting on and making clear whose voices are represented in the data that I am assessing. Further, I plan to triangulate my CSS findings with surveys and qualitative analysis in order to mitigate these issues.

Recommendations for other people using CSS or data science for social justice

Be clear about the various limitations that exist in CSS, and different types of biases that are embedded in many of the models employed (e.g., data they learn from is often from the West, mostly male, mostly from White people).
Don’t forget your own positionality and be explicit about it in regards to your research questions.
Continue efforts that focus on core and foundational issues that perpetuate injustice alongside computational efforts.
… and ChatGPT is helpful with the coding process!

Endnotes

My team included: Elena Ojeda, UCB Department of Economics; Stephanie Veazie, UCB Division of Epidemiology; and Vanessa Navarro Rodríguez, UCB Department of Political Science. We identify as women and are based in California at UC Berkeley (considered a progressive / liberal university in a state where abortion is legal and protected). Our personal stories and backgrounds differ in terms of where we come from and our own journeys (but we all identify as pro-choice and consider abortion access an essential component of women's health care).

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

Topics