Adapting Term Recognition to an Under-Resourced Language: the Case of Irish

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

Automatic Term Recognition (ATR) is an important method for the summarization and analysis of large corpora, and normally requires a significant amount of linguistic input, in particular the use of part-of-speech taggers. For an under-resourced language such as Irish, the resources necessary for this may be scarce or entirely absent. We evaluate two methods for the automatic extraction of terms, based on the small part-of-speech-tagged corpora that are available for Irish and on a large terminology list, and show that both methods can produce viable term extractors. We evaluate this with a newly constructed corpus that is the first available corpus for term extraction in Irish. Our results shine some light on the challenge of adapting natural language processing systems to under-resourced scenarios.
Original languageEnglish (Ireland)
Title of host publicationProceedings of the Celtic Language Technology Workshop 2019
PublisherEuropean Association for Machine Translation
Pages48-57
Publication statusPublished - 1 Jan 2019

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • John P. McCrae and Adrian Doyle

Fingerprint

Dive into the research topics of 'Adapting Term Recognition to an Under-Resourced Language: the Case of Irish'. Together they form a unique fingerprint.

Cite this