A Filtering Approach to Object Region Recognition in Multimodal Machine Translation

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

Recent studies in Multimodal Machine Translation (MMT) have explored the use of visual information in a multimodal setting to analyze its redundancy with textual information. The aim of this work is to develop a more effective approach to incorporating relevant visual information into the translation process and improve the overall performance of MMT models. This paper proposes an object-level filtering approach in Multimodal Machine Translation, where the approach is applied to object regions extracted from an image to filter out irrelevant objects based on the image captions to be translated. Using the filtered image helps the model to consider only relevant objects and their relative locations to each other. Different matching methods, including string matching and word embeddings, are employed to identify relevant objects. Gaussian blurring is used to soften irrelevant objects from the image and to evaluate the effect of object filtering on translation quality. The performance of the filtering approaches was evaluated on the Multi30K dataset in English to German, French, and Czech translations, based on BLEU, ChrF2, and TER metrics.
Original languageEnglish (Ireland)
Title of host publicationIn Proceedings of the 19th Machine Translation Summit Conference (MTSummit 2023)
Place of PublicationMacau, China
Publication statusPublished - 1 Sep 2023

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • Hatami, A; Buitelaar, P; Arcan, M

Fingerprint

Dive into the research topics of 'A Filtering Approach to Object Region Recognition in Multimodal Machine Translation'. Together they form a unique fingerprint.

Cite this