Enriching a terminology for under-resourced languages using knowledge graphs

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

Translated terminology for severely under-resourced languages is a vital tool for aid workers working in humanitarian crises. However there are generally no lexical resources that can be used for this purpose. Translators without Borders (TWB) is a non-profit whose goal is to help get vital information, including developing lexical resources for aid workers. In order to help with the resource construction, TWB has worked with the ADAPT Centre to develop tools to help with the development of their resources for crisis response. In particular, we have enriched these resources by linking with open lexical resources such as WordNet and Wikidata as well as the derivation of a novel extended corpus. In particular, this work has focused on the development of resources for languages useful for aid workers working with Rohingya refugees, namely, Rohingya, Chittagonian, Bengali and Burmese. These languages are all under-resourced and for Rohingya and Chittagonian there are only very limited major lexical resources available. For these languages, we have constructed some of the first corpora resources that will allow automatic construction of lexical resources. We have also used the Naisc tool for monolingual dictionary linking in order to connect the existing English parts of the lexical resources with information from WordNet and Wikidata and this has provided a wealth of extra information including images, alternative definitions, translations (in Bengali, Burmese and other languages) as well as many related terms that may guide TWB linguists and terminologists in the process of extending their resources. We have presented these results in an interface allowing the lexicographers to browse through the results extracted from the external resources and select those that they wish to include in their resource. We present results on the quality of the linking inferred by the Naisc system as well as qualitative analysis of the effectiveness of the tool in the development of the TWB glossaries
Original languageEnglish (Ireland)
Title of host publicationElectronic lexicography in the 21st century (eLex 2021): Post-editing lexicography Proceedings of the eLex 2021 conference
Place of PublicationOnline
Publication statusPublished - 1 Jul 2021

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • McCrae, John P.; Ojha, Atul Kr.; Chakravarthi, Bharathi Raja; Kelly, Ian; Buffini, Patricia; Tang, Grace; Paquin, Eric and Locria, Manuel

Fingerprint

Dive into the research topics of 'Enriching a terminology for under-resourced languages using knowledge graphs'. Together they form a unique fingerprint.

Cite this