Named entity recognition of persons' names in Arabic tweets

Omnia H. Zayed, Samhaa R. El-Beltagy

Research output: Contribution to a Journal (Peer & Non Peer)Conference articlepeer-review

9 Citations (Scopus)

Abstract

The rise in Arabic usage within various socialmedia platforms, and notably in Twitter, has led to a growing interest in building ArabicNatural Language Processing (NLP) applications capable of dealing with informal colloquialArabic, as it is the most commonly used form of Arabic in social media. The uniquecharacteristics of the Arabic language make the extraction of Arabic named entities achallenging task, to which, the nature of tweets adds new dimensions. The majority ofprevious research done on Arabic NER focused on extracting entities from the formallanguage, namely Modern Standard Arabic (MSA). However, the unstructured nature ofthe colloquial language used in tweets degrades the performance of NER systems developedto support formal MSA text. In this paper, we focus on the task of Arabic persons'names recognition. Specifically, we introduce an approach to extract Arabic persons'names from tweets without employing any morphological analysis or languagedependentfeatures. The proposed approach adopts a rule-based model combined with astatistical one. This approach uses unsupervised learning of patterns and clustered dictionariesas constrains to identify a person's name and resolve its ambiguity. Our approachoutperforms the best reported result in the literature on the same test set by an increaseof 19.6% in the F-score.

Original languageEnglish
Pages (from-to)731-738
Number of pages8
JournalInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2015-January
Publication statusPublished - 2015
Externally publishedYes
Event10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015 - Hissar, Bulgaria
Duration: 7 Sep 20159 Sep 2015

Fingerprint

Dive into the research topics of 'Named entity recognition of persons' names in Arabic tweets'. Together they form a unique fingerprint.

Cite this