Text Analysis

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.

Ini Umosen

Consulting Drop-In Hours: Tue 9am-11am

Consulting Areas: R, Stata, LaTeX, Data Manipulation and Cleaning, Data Science, Data Visualization, R Programming, Text Analysis, Web Scraping, Regression Analysis, RStudio, RStudio Cloud, Stata

Quick-tip: the fastest way to speak to a consultant is to first ...

Jailynne Estevez

Consultant
Info & Data Science MIDS

Jailynne Estevez is a Data Analyst and a prospective Masters in Information and Data Science candidate at UC Berkeley. With a bachelor's in Public Policy, she brings a diverse skill set to her pursuits, demonstrating aptitude in data analysis and programming.

Ini Umosen

Consultant
Economics

Ini is a PhD candidate in the Department of Economics. She studies topics in labor economics and the economics of education using applied econometrics methods. Current work in progress includes evaluating the impact of school choice systems and investigating gender and racial bias on gig platforms. She is a former Graduate Research Fellow at the California Policy Lab. She has also been a tutor for econometrics, labor economics, and macroeconomics.

Python Text Analysis Fundamentals: Parts 1-2

September 25, 2023, 2:00pm
This two-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Sanjana Gajendran

Consultant
MIMS

I'm a second year MIMS Student with a focus on Data Science and Natural Language Processing. During the Summer 2023, I interned at Genentech as a Data Science Intern.

Python Text Analysis Fundamentals: Parts 1-2

June 20, 2023, 9:00am
This two-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.
See event details for participation information.

Twitter Text Analysis: A Friendly Introduction

October 25, 2022

Read part 2 here.

Introduction

Text analysis techniques, including sentiment analysis, topic modeling, and named entity recognition, have been increasingly used to probe patterns in a variety of text-based documents, such as books, social media posts, and others. This blog post introduces Twitter text analysis, but is not intended to cover all of the aforementioned topics. The tutorial is broken down into two parts. In this very first post, I...

Python Text Analysis: Topic Modeling

March 29, 2023, 2:00pm
In this part, we study unsupervised learning of text data. This is a stand alone work that builds from the two-part text analysis series.

Python Text Analysis: Word Embeddings

April 5, 2023, 2:00pm
How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context.