An approach for extracting and disambiguating arabic persons' names using clustered dictionaries and scored patterns

Omnia Zayed, Samhaa El-Beltagy, Osama Haggag

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

5 Citations (Scopus)

Abstract

Building a system to extract Arabic named entities is a complex task due to the ambiguity and structure of Arabic text. Previous approaches that have tackled the problem of Arabic named entity recognition relied heavily on Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets to solve the ambiguity problem. But while these approaches are applicable to modern standard Arabic (MSA) text, they cannot handle colloquial Arabic. With the rapid increase in online social media usage by Arabic speakers, it is important to build an Arabic named entity recognition system that deals with both colloquial Arabic and MSA text. This paper introduces an approach for extracting Arabic persons' name without utilizing any Arabic parsers or taggers. Evaluation of the presented approach shows that it achieves high precision and an acceptable level of recall on a benchmark dataset.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, Proceedings
Pages201-212
Number of pages12
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event18th International Conference on Application of Natural Language to Information Systems, NLDB 2013 - Salford, United Kingdom
Duration: 19 Jun 201321 Jun 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7934 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Application of Natural Language to Information Systems, NLDB 2013
Country/TerritoryUnited Kingdom
CitySalford
Period19/06/1321/06/13

Fingerprint

Dive into the research topics of 'An approach for extracting and disambiguating arabic persons' names using clustered dictionaries and scored patterns'. Together they form a unique fingerprint.

Cite this