Skip to main navigation Skip to search Skip to main content

TED-MWE: a bilingual parallel corpus with MWE annotation

  • Johanna Monti
  • , Federico Sangati
  • , MIHAEL ARCAN

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT training purposes and MWE translation quality evaluation. This paper describes a methodology to annotate a parallel spoken corpus with MWEs. The dataset used for this experiment is an English-Italian corpus extracted from the TED spoken corpus and complemented by an SMT output.
Original languageEnglish (Ireland)
Title of host publicationSecond Italian Conference on Computational Linguistics (CLiC-it 2015)
PublisherAccademia University Press
DOIs
Publication statusPublished - 1 Jan 2015

Fingerprint

Dive into the research topics of 'TED-MWE: a bilingual parallel corpus with MWE annotation'. Together they form a unique fingerprint.

Cite this