Abstract
Recent studies in Multimodal Machine Translation (MMT) have explored the use of visual information in a multimodal setting to analyze its redundancy with textual information. The aim
of this work is to develop a more effective approach to incorporating relevant visual information into the translation process and improve the overall performance of MMT models. This
paper proposes an object-level filtering approach in Multimodal Machine Translation, where
the approach is applied to object regions extracted from an image to filter out irrelevant objects based on the image captions to be translated. Using the filtered image helps the model to consider only relevant objects and their relative locations to each other. Different matching methods, including string matching and word embeddings, are employed to identify relevant objects. Gaussian blurring is used to soften irrelevant objects from the image and to evaluate the effect of object filtering on translation quality. The performance of the filtering approaches was evaluated on the Multi30K dataset in English to German, French, and Czech translations, based on BLEU, ChrF2, and TER metrics.
| Original language | English (Ireland) |
|---|---|
| Title of host publication | In Proceedings of the 19th Machine Translation Summit Conference (MTSummit 2023) |
| Place of Publication | Macau, China |
| Publication status | Published - 1 Sep 2023 |
Authors (Note for portal: view the doc link for the full list of authors)
- Authors
- Hatami, A; Buitelaar, P; Arcan, M