Text Analysis

Hate Speech

The hate speech measurement project began in early 2017 at UC Berkeley’s D-Lab. Our research project applies data science techniques such as machine learning to track changes in hate speech over time and across social media platforms. After three years, we have now published our groundbreaking method that measures hate speech with precision while mitigating the influence of human bias. Read the manuscript here.

Jailynne Estevez

Consulting Drop-In Hours: Fri 3pm-5pm

Consulting Areas: Python, SQL, Stata, HTML / CSS, Javascript, Google AppScripts, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Python Programming, Surveys, Sampling & Interviews, Text Analysis, , Bash or Command Line, Excel, Git or Github, Stata

Quick-tip: the fastest way to speak to a consultant is to first ...

María Martín López

Data Science Fellow

María Martín López is a PhD student in the Cognition area within the Department of Psychology. Her research relates to cognitive computational and quantitative models of individual differences in behaviors, thoughts, and emotions. She is particularly interested in how we can create and leverage novel algorithms to understand, measure, and predict processes relating to externalizing psychopathology (e.g. impulsivity, aggression, substance use). She answers these questions using a range of computational and quantitive models including AI, NLP, SEM, time series analysis, multi-level...

Python Text Analysis: Topic Modeling

October 16, 2023, 2:00pm
In this part, we study unsupervised learning of text data. This is a stand alone work that builds from the two-part text analysis series.

Python Text Analysis: Word Embeddings

October 25, 2023, 2:00pm
How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text, because it does not utilize word context.

Chirag Manghani

Consulting Drop-In Hours: Fri 2pm-4pm

Consulting Areas: Python, R, SQL, Stata, SAS, LaTeX, HTML / CSS, Javascript, C++, APIs, Cloud & HPC Computing, Cybersecurity & Data Security, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Software Tools, Text Analysis, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Excel, Git or Github, Qualtrics, RStudio, RStudio...

Twitter Text Analysis: A Friendly Introduction, Part 2

March 7, 2023
by Mingyu Yuan. This blog post is the second part of “Twitter Text Analysis”. The goal is to use language models such as BERT to build a classifier on tweets. Word embedding, training and test splitting, model implementation, and model evaluation are introduced in this model.

Why We Need Digital Hermeneutics

July 13, 2023
by Tom van Nuenen. Tom van Nuenen discusses the sixth iteration of his course named Digital Hermeneutics at Berkeley. The class teaches the practices of data science and text analysis in the context of hermeneutics, the study of interpretation. In the course, students analyze texts from Reddit communities, focusing on how these communities make sense of the world. This task combines both close and distant readings of texts, as students employ computational tools to find broader patterns and themes. The article reflects on the rise of AI language models like ChatGPT, and how these machines interpret human interpretations. The popularity and profitability of language models presents an issue for the future of open research, due to the monetization of social media data.

Unlock the Joy and Power of Reading in Language Learning

August 21, 2023
by Bowen Wang-Kildegaard. I share my story of how reading for pleasure transformed my English speaking and writing skills. This experience inspired my passion to promote the joy and power of reading to all language learners. Using natural language processing techniques, I dive into the Language Learning subreddit, revealing a trend: Learners are often highly anxious about output practices, but are generally positive about input methods like reading and listening. I then distill complex language learning theories into actionable language learning tips, emphasizing the value of extensive reading for pleasure, pointing to potential methods like using ChatGPT for customization of reading materials, and advocating for joy in the learning journey.

My Summer Exploring Data Science for Social Justice: Learnings, Tensions & Recommendations

September 5, 2023
by Genevieve Smith. This summer I joined the D-Lab hosted Data Science for Social Justice workshop at UC Berkeley diving into Python – including TF-IDF, sentiment analysis, word embeddings, and more – with a lens towards leveraging data science for social justice. My team explored a Reddit channel on abortion and used computational analysis to answer key questions related to abortion access from before versus after Roe vs. Wade was overturned. Computational social science is incredibly powerful, but I continue to grapple with tensions particularly as it relates to employing machine learning and large language in international research, and end with key recommendations for CSS practitioners.