Reconciling Heterogeneous Descriptions of Language Resources

  • John P. McCrae
  • , Philipp Cimiano
  • , Luca Matteis
  • , Roberto Navigli
  • , Victor Rodríguez Doncel
  • , Daniel Vila-Suero
  • , Jorge Gracia
  • , Andrejs Abele
  • , Gabriela Vulcu
  • , Paul Buitelaar

    Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

    13 Citations (Scopus)

    Abstract

    Language resources are a cornerstone of linguistic research and for the development of natural language processing tools, but the discovery of relevant resources remains a challenging task. This is due to the fact that relevant metadata records are spread among different repositories and it is currently impossible to query all these repositories in an integrated fashion, as they use different data models and vocabularies. In this paper we present a first attempt to collect and harmonize the metadata of different repositories, thus making them queriable and browsable in an integrated way. We make use of RDF and linked data technologies for this and provide a first level of harmonization of the vocabularies used in the different resources by mapping them to standard RDF vocabularies including Dublin Core and DCAT. Further, we present an approach that relies on NLP and in particular word sense disambiguation techniques to harmonize resources by mapping values of attributes - such as the type, license or intended use of a resource - into normalized values. Finally, as there are duplicate entries within the same repository as well as across different repositories, we also report results of detection of these duplicates.

    Original languageEnglish
    Title of host publicationProceedings of the 4th Workshop on Linked Data in Linguistics
    Subtitle of host publicationResources and Applications, LDL 2015 - collocated with 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2015
    EditorsChristian Chiarcos, John Philip McCrae, Petya Osenova, Philipp Cimiano, Nancy Ide
    PublisherAssociation for Computational Linguistics (ACL)
    Pages39-48
    Number of pages10
    ISBN (Electronic)9781941643570
    Publication statusPublished - 2015
    Event4th Workshop on Linked Data in Linguistics: Resources and Applications, LDL 2015 - Beijing, China
    Duration: 31 Jul 2015 → …

    Publication series

    NameProceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, LDL 2015 - collocated with 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2015

    Conference

    Conference4th Workshop on Linked Data in Linguistics: Resources and Applications, LDL 2015
    Country/TerritoryChina
    CityBeijing
    Period31/07/15 → …

    Fingerprint

    Dive into the research topics of 'Reconciling Heterogeneous Descriptions of Language Resources'. Together they form a unique fingerprint.

    Cite this