Text Analysis

Python Text Analysis Fundamentals: Parts 1-3

September 21, 2021, 10:00am
This three-part workshop series will prepare participants to move forward with research that uses text analysis, with a special focus on humanities and social science applications.

Tactics for Text Mining non-Roman Scripts

April 15, 2024
by Hilary Faxon, Ph.D. & Win Moe. Non-Roman scripts pose particular challenges for text mining. Here, we reflect on a project that used text mining alongside qualitative coding to understand the politicization of online content following Myanmar’s 2021 military coup.

Nicolas Nunez-Sahr

Consultant
Statistics

I lived in Santiago, Chile until I graduated from high school, and then moved to the US for undergrad at Stanford, where I obtained a Bachelor’s degree from the Statistics Department. I then worked as a Data Scientist in an NLP startup that was based in Bend, OR, which analyzed news articles. I love playing soccer, volleyball, table tennis, flute, guitar, latin music, and meeting new people. I want to get better at mountain biking, whitewater kayaking, chess and computer vision. I find nature astounding, and love finding sources of inspiration.

Addison Pickrell

IUSE Undergraduate Advisory Board
Mathematics
Sociology

Addison is an aspiring mathematician and social scientist (Class of '27). He loves collecting books he'll never read, is an open-source and open-access advocate, and an aspiring community organizer and systems disrupter. Ask me about community-based participatory action research (CBPAR), critical pedagogy, applied mathematics, and social science.

Hate Speech

The hate speech measurement project began in early 2017 at UC Berkeley’s D-Lab. Our research project applies data science techniques such as machine learning to track changes in hate speech over time and across social media platforms. After three years, we have now published our groundbreaking method that measures hate speech with precision while mitigating the influence of human bias. Read the manuscript here.

María Martín López

Data Science Fellow
Psychology

María Martín López is a PhD student in the Cognition area within the Department of Psychology. Her research relates to cognitive computational and quantitative models of individual differences in behaviors, thoughts, and emotions. She is particularly interested in how we can create and leverage novel algorithms to understand, measure, and predict processes relating to externalizing psychopathology (e.g. impulsivity, aggression, substance use). She answers these questions using a range of computational and quantitive models including AI, NLP, SEM, time series analysis, multi-level...

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Why We Need Digital Hermeneutics

July 13, 2023
by Tom van Nuenen. Tom van Nuenen discusses the sixth iteration of his course named Digital Hermeneutics at Berkeley. The class teaches the practices of data science and text analysis in the context of hermeneutics, the study of interpretation. In the course, students analyze texts from Reddit communities, focusing on how these communities make sense of the world. This task combines both close and distant readings of texts, as students employ computational tools to find broader patterns and themes. The article reflects on the rise of AI language models like ChatGPT, and how these machines interpret human interpretations. The popularity and profitability of language models presents an issue for the future of open research, due to the monetization of social media data.

Unlock the Joy and Power of Reading in Language Learning

August 21, 2023
by Bowen Wang-Kildegaard. I share my story of how reading for pleasure transformed my English speaking and writing skills. This experience inspired my passion to promote the joy and power of reading to all language learners. Using natural language processing techniques, I dive into the Language Learning subreddit, revealing a trend: Learners are often highly anxious about output practices, but are generally positive about input methods like reading and listening. I then distill complex language learning theories into actionable language learning tips, emphasizing the value of extensive reading for pleasure, pointing to potential methods like using ChatGPT for customization of reading materials, and advocating for joy in the learning journey.

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.