TY - GEN
T1 - A hybrid approach for extracting arabic persons’ names and resolving their ambiguity from twitter
AU - Zayed, Omnia H.
AU - El-Beltagy, Samhaa R.
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Tweets offer a novel way of communication that enables users all over the world to share real-time news and ideas. The massive amount of tweets, generated regularly by Arabic speakers, has resulted in a growing interest in building Arabic named entity recognition (NER) systems that deal with the informal colloquial Arabic. The unique characteristics of the Arabic language make Arabic NER a challenging task, which, the informal nature of tweets further complicates. The majority of previous works addressing Arabic NER were concerned with formal modern standard Arabic (MSA). Moreover, taggers and parsers were often utilized to solve the ambiguity problem of Arabic persons’ names. Although, previously developed approaches perform well on MSA text, they are not suited for colloquial Arabic. This paper introduces a hybrid approach to extract Arabic persons’ names from tweets in addition to a way to resolve their ambiguity using context bigram patterns. The introduced approach attempts not to use any language-dependent resources. Evaluation of the presented approach shows a 7% improvement in the F-score over the best reported result in the literature.
AB - Tweets offer a novel way of communication that enables users all over the world to share real-time news and ideas. The massive amount of tweets, generated regularly by Arabic speakers, has resulted in a growing interest in building Arabic named entity recognition (NER) systems that deal with the informal colloquial Arabic. The unique characteristics of the Arabic language make Arabic NER a challenging task, which, the informal nature of tweets further complicates. The majority of previous works addressing Arabic NER were concerned with formal modern standard Arabic (MSA). Moreover, taggers and parsers were often utilized to solve the ambiguity problem of Arabic persons’ names. Although, previously developed approaches perform well on MSA text, they are not suited for colloquial Arabic. This paper introduces a hybrid approach to extract Arabic persons’ names from tweets in addition to a way to resolve their ambiguity using context bigram patterns. The introduced approach attempts not to use any language-dependent resources. Evaluation of the presented approach shows a 7% improvement in the F-score over the best reported result in the literature.
UR - http://www.scopus.com/inward/record.url?scp=84948807603&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-19581-0_32
DO - 10.1007/978-3-319-19581-0_32
M3 - Conference Publication
AN - SCOPUS:84948807603
SN - 9783319195803
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 355
EP - 368
BT - Natural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings
A2 - Handschuh, Siegfried
A2 - Freitas, André
A2 - Métais, Elisabeth
A2 - Biemann, Chris
A2 - Meziane, Farid
PB - Springer-Verlag
T2 - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015
Y2 - 17 June 2015 through 19 June 2015
ER -