TransModal | Translating from Multiple Modalities into Text

Summary
Recent years have witnessed the development of a wide range of computational methods that process and generate natural language text. Many of these have become familiar to mainstream computer users such as tools that retrieve documents matching a query, perform sentiment analysis, and translate between languages. Systems like Google Translate can instantly translate between any pair of over fifty human languages allowing users to read web content that wouldn't have otherwise been available. The accessibility of the web could be further enhanced with applications that translate within the same language, between different modalities, or different data formats. There are currently no standard tools for simplifying language, e.g., for low-literacy readers or second language learners. The web is rife with non-linguistic data (e.g., databases, images, source code) that cannot be searched since most retrieval tools operate over textual data. In this project we maintain that in order to render electronic data more accessible to individuals and computers alike, new types of models need to be developed. Our proposal is to provide a unified framework for translating from comparable corpora, i.e., collections consisting of data in the same or different modalities that address the same topic without being direct translations of each other. We will develop general and scalable models that can solve different translation tasks and learn the necessary intermediate representations of the units involved in an unsupervised manner without extensive feature engineering. Thanks to recent advances in deep learning, we will induce representations for different modalities, their interactions, and correspondence to natural language. Beyond addressing a fundamental aspect of the translation problem, the proposed research will lead to novel internet-based applications that simplify and summarize text, produce documentation for source code, and meaningful descriptions for images.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/681760
Start date: 01-09-2016
End date: 31-08-2022
Total budget - Public funding: 1 900 777,50 Euro - 1 900 777,00 Euro
Cordis data

Original description

Recent years have witnessed the development of a wide range of computational methods that process and generate natural language text. Many of these have become familiar to mainstream computer users such as tools that retrieve documents matching a query, perform sentiment analysis, and translate between languages. Systems like Google Translate can instantly translate between any pair of over fifty human languages allowing users to read web content that wouldn't have otherwise been available. The accessibility of the web could be further enhanced with applications that translate within the same language, between different modalities, or different data formats. There are currently no standard tools for simplifying language, e.g., for low-literacy readers or second language learners. The web is rife with non-linguistic data (e.g., databases, images, source code) that cannot be searched since most retrieval tools operate over textual data. In this project we maintain that in order to render electronic data more accessible to individuals and computers alike, new types of models need to be developed. Our proposal is to provide a unified framework for translating from comparable corpora, i.e., collections consisting of data in the same or different modalities that address the same topic without being direct translations of each other. We will develop general and scalable models that can solve different translation tasks and learn the necessary intermediate representations of the units involved in an unsupervised manner without extensive feature engineering. Thanks to recent advances in deep learning, we will induce representations for different modalities, their interactions, and correspondence to natural language. Beyond addressing a fundamental aspect of the translation problem, the proposed research will lead to novel internet-based applications that simplify and summarize text, produce documentation for source code, and meaningful descriptions for images.

Status

CLOSED

Call topic

ERC-CoG-2015

Update Date

27-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.1. EXCELLENT SCIENCE - European Research Council (ERC)
ERC-2015
ERC-2015-CoG
ERC-CoG-2015 ERC Consolidator Grant