TY - JOUR
T1 - Detecting bot behaviour in social media using digital DNA compression
AU - Pasricha, Nivranshu
AU - Hayes, Conor
N1 - Publisher Copyright:
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2019
Y1 - 2019
N2 - A major challenge faced by online social networks such as Facebook and Twitter is the remarkable rise of fake and automated bot accounts over the last few years. Some of these accounts have been reported to engage in undesirable activities such as spamming, political campaigning and spreading falsehood on the platform. We present an approach to detect bot-like behaviour among Twitter accounts by analyzing their past tweeting activity. We build upon an existing technique of analysis of Twitter accounts called Digital DNA. Digital DNA models the behaviour of Twitter accounts by encoding the post history of a user account as a sequence of characters analogous to an actual DNA sequence. In our approach, we employ a lossless compression algorithm on these Digital DNA sequences and use the compression statistics as a measure of predictability in the behaviour of a group of Twitter accounts. We leverage the information conveyed by the compression statistics to visually represent the posting behaviour by a simple two dimensional scatter plot and categorize the user accounts as bots and genuine users by using an off-the-shelf implementation of the logistic regression classification algorithm.
AB - A major challenge faced by online social networks such as Facebook and Twitter is the remarkable rise of fake and automated bot accounts over the last few years. Some of these accounts have been reported to engage in undesirable activities such as spamming, political campaigning and spreading falsehood on the platform. We present an approach to detect bot-like behaviour among Twitter accounts by analyzing their past tweeting activity. We build upon an existing technique of analysis of Twitter accounts called Digital DNA. Digital DNA models the behaviour of Twitter accounts by encoding the post history of a user account as a sequence of characters analogous to an actual DNA sequence. In our approach, we employ a lossless compression algorithm on these Digital DNA sequences and use the compression statistics as a measure of predictability in the behaviour of a group of Twitter accounts. We leverage the information conveyed by the compression statistics to visually represent the posting behaviour by a simple two dimensional scatter plot and categorize the user accounts as bots and genuine users by using an off-the-shelf implementation of the logistic regression classification algorithm.
KW - Online Social Networks
KW - Social Media
KW - Twitter
UR - https://www.scopus.com/pages/publications/85081604711
M3 - Conference article
AN - SCOPUS:85081604711
SN - 1613-0073
VL - 2563
SP - 376
EP - 387
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2019
Y2 - 5 December 2019 through 6 December 2019
ER -