TY - GEN
T1 - Classifying Fake and Real Neurally Generated News
AU - Govindaraju, Anitha
AU - Griffith, Josephine
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - In this data era, with Natural Language Processing (NLP) techniques such as 'Language Modelling' showing great progress, it is observed that the idea of 'Automated Journalism' i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called 'Neural fake news'. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.
AB - In this data era, with Natural Language Processing (NLP) techniques such as 'Language Modelling' showing great progress, it is observed that the idea of 'Automated Journalism' i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called 'Neural fake news'. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.
KW - Language model
KW - Machine generated news
KW - Neural fake news
KW - Siamese Bi-LSTM neural network
KW - STS
UR - https://www.scopus.com/pages/publications/85123856362
U2 - 10.1109/SweDS53855.2021.9638268
DO - 10.1109/SweDS53855.2021.9638268
M3 - Conference Publication
AN - SCOPUS:85123856362
T3 - Proceedings of the 2021 Swedish Workshop on Data Science, SweDS 2021
BT - Proceedings of the 2021 Swedish Workshop on Data Science, SweDS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th Swedish Workshop on Data Science, SweDS 2021
Y2 - 2 December 2021 through 3 December 2021
ER -