Hate Speech Research

Deep Learning with Item Response Theory

The hate speech measurement project

Since the launch of Donald Trump’s presidential campaign, reports of hate speech targeting various minority groups have risen dramatically (Ansari 2016). Although this surge is well-reported (ADL 2017; SPLC 2016), it remains difficult to quantify the magnitude of the problem or even properly classify hate speech (Silva et al. 2016). Keyword searches and dictionary methods are often imprecise and overly blunt tools for detecting the nuance and complexity of hate speech.

To overcome these challenges the study identifies and examines online incidents of hate speech, designing a research methodology that is replicable. The Hate Speech Team developed a theoretically informed codebook and manually labeled hate speech in approximately 9,000 online comments sourced from online platforms such as Reddit.

The Hate Speech Team deployed supervised machine learning algorithms to differentiate hate speech from non-hate speech on the manually labeled hate speech dataset. The study investigated traditional feature engineering in natural language processing (n-grams, part of speech tagging, etc.) combined with standard machine learning algorithms (naive bayes, random forest, gradient boosting, support vector machines, et al.)

These predictive models have groundbreaking applications and can be applied to new text on other platforms such as Facebook, Twitter, and The New York Times, to scalably and automatically identify hate speech. The system offers the potential to track changing patterns of hate speech over time for the creation of an “online hate index.”

Read the manuscript here