Using machine learning to slow the spread of hate speech.
You can’t stop what you don’t understand.
The first key to countering hate speech is to have a clear definition of what it is.
We adapted Dr. Gregory Staunton’s 10 Stages of Genocide to create a structure for hate speech identification. Originally presented in a briefing to the U.S. Department of State in 1996, the report helped us understand the process of classification and dehumanization.
We changed it by condensing the stages and removing some that aren’t relevant to Twitter (e.g. extermination). We also added contemporary phenomena found in social media (e.g. coded language).
Our hate speech classifications
|5. Intention||Incitement to genocide
Incitement to general violence
Incitement to specific violence
Incitement to degrade and discriminate
|4. Polarization||Inculpation of target group
Promotion of known hate groups
Exclusion of target group
|3. Dehumanization||Propagation of stereotype
Derogatory language against target group
|2. Classification||Target group comparison
Target group identification
|1. Coded Language||Innuendo signaling in-group/out-group nationalism
Innuendo implicating a target group
Innuendo excluding a target group
We teach machines to help us.
The power of machine learning is that it allows us to analyze thousands of tweets and return hate classifications within milliseconds. The flexibility of our platform allows us to continually adapt our model to constantly evolving terminologies used by hate groups on social media.
Build a Machine
We leverage enterprise-level AI platforms for Natural Language Processing and Image Recognition APIs, so that we are able to digest and interpret messages as they are posted, in near real time.
Train the Machine
Our Machine needs to be good at sniffing out one thing- hate speech. So we need to feed it a stream of hate speech in social media to break down and learn from. We use Spredfast, an intelligent social listening platform, to moderate incoming messages and categorize them into streams of hate speech. Those streams are fed on an ongoing basis, into our Machine so it can understand the linguistic nuances begin learning.
But even with artificial intelligence, there are challenges in identifying hate speech online.
Machines have trouble understanding the subjectivity and nuance of hate speech. See the examples below, all referencing "third world" in different ways.
Not Hate Speech
Folks like to say baseball is boring but we're about to have the third World Series Game 7 in four years. This one needed to go the distance— Michael Lee (@MrMichaelLee) November 1, 2021
The year is 2029.— fdu (@FakeDanUggla) November 21, 2021
Yankees SS Kevin Maitan just won his forth-consecutive MVP award after leading his team to their third World Series title in as many years.
The Atlanta Braves finished 52-110, their sixth straight losing season.
Braves manager Larry Jones has been fired.
Not a single white country. White people don't break our laws the way third-worlders do.— Maine First Media (@MaineFirstMedia) September 18, 2021
They Have To Go Back#DACA #mepolitics #MAGA 🇺🇸 pic.twitter.com/c274AqZawm
We defeated Hitler so we could pay for endless third worlders who gang rape our women while bearded “women” demand to be called women and Berlin could pay for a teaching manual that provides instruction for teachers on how to teach gender diversity issues to pre-school children— ☩ EMPRESS WIFE 👑 (@EmpresWife) February 10, 2022
Our solution was to train our A.I. to understand and grade hate by distinct hate speech categorizations.
2017 Trends in hate speech categories
February: Travel Ban / Dehumanization
August: Charlottesville Protests / Polarization
September: DACA Debate and NFL Protests / Dehumanization
October: "Its Okay to be White" Movement / Coded Language
November: #KatesWall / Dehumanization
Supervised machine learning.
We Counter Hate is a human-moderated platform.
Our machine learning platform is continuously finding hate speech for us to counter. We're continuously giving it feedback based on what we’re given. This loop continually refines our framework, increasing reliability of the hate speech we "counter."