Log in

Sign up for our weekly newsletter!

Please visit: https://hatespeech.berkeley.edu/

Since the launch of Donald Trump’s presidential campaign, reports of hate speech targeting various minority groups have risen dramatically. Although this surge is well-reported, it remains difficult to quantify the magnitude of the problem or even properly classify hate speech.

To overcome these challenges and others, this study identifies and examines online incidents of hate speech, designing a research methodology that is replicable. We develop a theoretically informed codebook and hand label hate speech in approximately 9,000 online comments sourced from Reddit in June and July 2015 as well as October and November 2016. We subsequently apply supervised machine learning algorithms to differentiate hate speech from non-hate speech on this labelled text.

These predictive models can be applied to new text on Facebook, Twitter, The New York Times, and a variety of other platforms, to scalably and automatically identify hate speech. The system offers the potential to track changing patterns of hate speech over time for the creation of an “online hate index.”

Anti-Defamation League
Nora Broege, Chris Kennedy, Alexander Sahn, Claudia von Vacano (PI)