Abstract
The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT training purposes and MWE translation quality evaluation. This paper describes a methodology to annotate a parallel spoken corpus with MWEs. The dataset used for this experiment is an English-Italian corpus extracted from the TED spoken corpus and complemented by an SMT output.
| Original language | English (Ireland) |
|---|---|
| Title of host publication | Second Italian Conference on Computational Linguistics (CLiC-it 2015) |
| Publisher | Accademia University Press |
| DOIs | |
| Publication status | Published - 1 Jan 2015 |
Fingerprint
Dive into the research topics of 'TED-MWE: a bilingual parallel corpus with MWE annotation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver