A hybrid approach for extracting arabic persons’ names and resolving their ambiguity from twitter

Omnia H. Zayed, Samhaa R. El-Beltagy

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

3 Citations (Scopus)

Abstract

Tweets offer a novel way of communication that enables users all over the world to share real-time news and ideas. The massive amount of tweets, generated regularly by Arabic speakers, has resulted in a growing interest in building Arabic named entity recognition (NER) systems that deal with the informal colloquial Arabic. The unique characteristics of the Arabic language make Arabic NER a challenging task, which, the informal nature of tweets further complicates. The majority of previous works addressing Arabic NER were concerned with formal modern standard Arabic (MSA). Moreover, taggers and parsers were often utilized to solve the ambiguity problem of Arabic persons’ names. Although, previously developed approaches perform well on MSA text, they are not suited for colloquial Arabic. This paper introduces a hybrid approach to extract Arabic persons’ names from tweets in addition to a way to resolve their ambiguity using context bigram patterns. The introduced approach attempts not to use any language-dependent resources. Evaluation of the presented approach shows a 7% improvement in the F-score over the best reported result in the literature.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings
EditorsSiegfried Handschuh, André Freitas, Elisabeth Métais, Chris Biemann, Farid Meziane
PublisherSpringer-Verlag
Pages355-368
Number of pages14
ISBN (Print)9783319195803
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015 - Passau, Germany
Duration: 17 Jun 201519 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9103
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015
Country/TerritoryGermany
CityPassau
Period17/06/1519/06/15

Fingerprint

Dive into the research topics of 'A hybrid approach for extracting arabic persons’ names and resolving their ambiguity from twitter'. Together they form a unique fingerprint.

Cite this