Using machine learning to slow the spread of hate speech.


You can’t stop what you don’t understand.

The first key to countering hate speech is to have a clear definition of what it is.

We adapted Dr. Gregory Staunton’s 10 Stages of Genocide to create a structure for hate speech identification. Originally presented in a briefing to the U.S. Department of State in 1996, the report helped us understand the process of classification and dehumanization. 

We changed it by condensing the stages and removing some that aren’t relevant to Twitter (e.g. extermination). We also added contemporary phenomena found in social media (e.g. coded language). 


Our hate speech classifications

Mode Output
5. Intention Incitement to genocide
Incitement to general violence
Incitement to specific violence
Incitement to degrade and discriminate
4. Polarization Inculpation of target group
Historical negationism
Promotion of known hate groups
Exclusion of target group
3. Dehumanization Propagation of stereotype
Derogatory language against target group
2. Classification Target group comparison
Target group identification
1. Coded Language Innuendo signaling in-group/out-group nationalism
Innuendo implicating a target group
Innuendo excluding a target group
When there is such a volume, we have to ask ourselves what can we do? What can the Internet service providers do? What can vast segments of society do? So that we hold people accountable and create safe spaces online the way we expect those spaces to be in the real world.
— Oren Segal | Director of ADL's Center on Extremism

We teach machines to help us.

The power of machine learning is that it allows us to analyze thousands of tweets and return hate classifications within milliseconds. The flexibility of our platform allows us to continually adapt our model to constantly evolving terminologies used by hate groups on social media. 


Step 1:
Build a Machine

We leverage enterprise-level AI platforms for Natural Language Processing and Image Recognition APIs, so that we are able to digest and interpret messages as they are posted, in near real time.

Step 2:
Train the Machine

Our Machine needs to be good at sniffing out one thing- hate speech. So we need to feed it a stream of hate speech in social media to break down and learn from. We use Spredfast, an intelligent social listening platform, to moderate incoming messages and categorize them into streams of hate speech. Those streams are fed on an ongoing basis, into our Machine so it can understand the linguistic nuances begin learning.


But even with artificial intelligence, there are challenges in identifying hate speech online.

Machines have trouble understanding the subjectivity and nuance of hate speech. See the examples below, all referencing "third world" in different ways.


Not Hate Speech


Uber Hateful


Our solution was to train our A.I. to understand and grade hate by distinct hate speech categorizations.


2017 Trends in hate speech categories


Key moments:

February: Travel Ban / Dehumanization
August: Charlottesville Protests / Polarization
September: DACA Debate and NFL Protests / Dehumanization
October: "Its Okay to be White" Movement / Coded Language
November: #KatesWall / Dehumanization


Supervised machine learning.

We Counter Hate is a human-moderated platform.

Our machine learning platform is continuously finding hate speech for us to counter. We're continuously giving it feedback based on what we’re given. This loop continually refines our framework, increasing reliability of the hate speech we "counter."