Please visit our Hate Speech Research website for more information. 

According to the Pew Research Center 41% of American adults have experienced online hate speech and harassment, and 66% have witnessed it. D-Lab’s Online Hate Index research project investigates hate speech as a social and linguistic phenomenon, grounded on prior domestic and international policy and law, academic research, and in collaboration with advocacy organizations. Our hate speech research innovates on several fronts: 

Detailed survey instrument.

We developed and tested a survey instrument, containing specific questions, drawing on the academic literature on hate speech in order to clearly distinguish hate speech from other offensive language. The items that we have developed are both highly reliable, meaning that different labellers tend to agree, and valid, meaning that items capture the same concept.

Crowdsourcing of labellers.

We first worked with ten students that were diverse in terms of race/ethnicity, gender, religion, linguistic background, nationality, sexuality in order to de-code and discern hateful comments targeted to a wide variety of groups of people. Subsequently, we developed a system to use workers from Amazon Mechanical Turk. In this way, we were able to scale comment labelling by orders of magnitude.

Hate speech lexicon.

As part of our data sampling process, we are creating an expansive lexicon of hate speech words and phrases, improving on existing sources such as Hatebase.

Enhanced Natural Language Processing and machine learning methods.

Our team has experimented with Natural Language Processing and machine learning methodologies to improve the accuracy of the models and to allow us to compare across platforms and over time.


D-Lab's Online Hate Index Research In the Press

This work has been receiving a fair amount of financial support and media exposure.  Most recently, the Berkeley Institute for Data Science recognized this research through a $30,000 grant.  Sadly, recent events have brought hate speech to the forefront. One piece by Rachael Myrow titled, Why It's So Hard to Scrub Hate Speech Off Social Media featured on the California Report on KQED discussed the connections between hate speech and hate acts and the many times illusive language used by offenders. A longer audio blog is also available by Rachael Myrow and Devin Katayama titled, Silicon Valley Is Trying To Prevent Hate Speech. Is It Working? Previously, Brittan Heller, an online human rights advocate, wrote the following piece featured in Wired Magazine, What Mark Zuckerberg Gets Wrong--and Right--About Hate Speech.


If you would like to know more about hate speech, don’t hesitate to contact me, Claudia von Vacano,  at cvonvacano at berkeley dot edu.


Claudia von Vacano

Dr. Claudia von Vacano is the Social Sciences D-Lab and Digital Humanities Executive Director. She is deeply committed to supporting the success of marginalized students including women, racial/ethnic minorities, first-generation college-going, and speakers of English as a second, and she has worked extensively with these groups at various stages of the educational pipeline. Dr. von Vacano has created outreach and intervention strategies through the UC Office of the President and she is currently the project director of a $3 million NSF Improving Undergraduate STEM Education (IUSE) initiative under the leadership of Faculty Director David J. Harding and with cross-university governance including the new Associate Provost of Data Science and Information and Dean of the Information School. She is the P.I. of an online hate speech research project with the financial support of the Anti-defamation League and Google Jigsaw—that employs IRT and Deep Learning. She is also co-PI with Karen Chapple, City Planning Chair, College of Environmental Design of a Chan Zuckerburg Initiative grant to provide professional development for housing professionals in the San Francisco Bay Area. Each year through the D-Lab and Digital Humanities at Berkeley, Dr. von Vacano oversees programs including 300 computational and data-intensive workshops and 1,200 consultations. She co-developed the core curriculum for the Digital Humanities Summer-only Minor and Certificate program at UC Berkeley. She is the lead online course developer of the SAGE Campus, “Introduction to Applied Data Science Methods for Social Scientists.”