PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits

Almerekhi, Hind; Kwak, Haewoon; Salminen, Joni; Jansen, Bernard J.

PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits

annif.suggestions	social media\|online communities\|toxicity\|Internet\|networks (societal phenomena)\|networking (making contacts)\|machine learning\|conversation\|network communication\|direct use\|en	en
annif.suggestions.links	http://www.yso.fi/onto/yso/p20774\|http://www.yso.fi/onto/yso/p23472\|http://www.yso.fi/onto/yso/p12637\|http://www.yso.fi/onto/yso/p20405\|http://www.yso.fi/onto/yso/p5570\|http://www.yso.fi/onto/yso/p20000\|http://www.yso.fi/onto/yso/p21846\|http://www.yso.fi/onto/yso/p14004\|http://www.yso.fi/onto/yso/p14112\|http://www.yso.fi/onto/yso/p7599	en
dc.contributor.author	Almerekhi, Hind
dc.contributor.author	Kwak, Haewoon
dc.contributor.author	Salminen, Joni
dc.contributor.author	Jansen, Bernard J.
dc.contributor.department	fi=Ei tutkimusalustaa\|en=No platform\|	-
dc.contributor.faculty	fi=Markkinoinnin ja viestinnän yksikkö\|en=School of Marketing and Communication\|	-
dc.contributor.orcid	https://orcid.org/0000-0003-3230-0561	-
dc.contributor.organization	fi=Vaasan yliopisto\|en=University of Vaasa\|
dc.date.accessioned	2023-01-18T11:43:53Z
dc.date.accessioned	2025-06-25T12:24:42Z
dc.date.available	2023-01-18T11:43:53Z
dc.date.issued	2022-12-11
dc.description.abstract	Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments. Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.	-
dc.description.notification	© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)	-
dc.description.reviewstatus	fi=vertaisarvioitu\|en=peerReviewed\|	-
dc.format.bitstream	true
dc.format.content	fi=kokoteksti\|en=fulltext\|	-
dc.format.extent	21	-
dc.identifier.olddbid	17610
dc.identifier.oldhandle	10024/15075
dc.identifier.uri	https://osuva.uwasa.fi/handle/11111/186
dc.identifier.urn	URN:NBN:fi-fe202301183527	-
dc.language.iso	eng	-
dc.publisher	Elsevier	-
dc.relation.doi	10.1016/j.dim.2022.100019	-
dc.relation.funder	Qatar National Research Fund	-
dc.relation.ispartofjournal	Data and Information Management	-
dc.relation.issn	2543-9251	-
dc.relation.issue	4	-
dc.relation.url	https://doi.org/10.1016/j.dim.2022.100019	-
dc.relation.volume	6	-
dc.rights	CC BY 4.0	-
dc.source.identifier	Scopus:85139638316	-
dc.source.identifier	https://osuva.uwasa.fi/handle/10024/15075
dc.subject	Online toxicity	-
dc.subject	Conversation threads	-
dc.subject	Reddit	-
dc.subject	Toxicity triggers	-
dc.subject	Neural networks	-
dc.subject.discipline	fi=Markkinointi\|en=Marketing\|	-
dc.subject.yso	social media	-
dc.title	PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits	-
dc.type.okm	fi=A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä\|en=A1 Peer-reviewed original journal article\|sv=A1 Originalartikel i en vetenskaplig tidskrift\|	-
dc.type.publication	article	-
dc.type.version	publishedVersion	-

Tiedostot

Näytetään 1 - 1 / 1

Name:: Osuva_Almerekhi_Kwak_Salminen_Jansen_2022.pdf
Size:: 2.99 MB
Format:: Adobe Portable Document Format
Description:: Artikkeli

Lataa

Kokoelmat

Artikkelit