FoTran | Found in Translation – Natural Language Understanding with Cross-Lingual Grounding

Summary
"Natural language understanding is the ""holy grail"" of computational linguistics and a long-term goal in research on artificial intelligence. Understanding human communication is difficult due to the various ambiguities in natural languages and the wide range of contextual dependencies required to resolve them. Discovering the semantics behind language input is necessary for proper interpretation in interactive tools, which requires an abstraction from language-specific forms to language-independent meaning representations. With this project, I propose a line of research that will focus on the development of novel data-driven models that can learn such meaning representations from indirect supervision provided by human translations covering a substantial proportion of the linguistic diversity in the world. A guiding principle is cross-lingual grounding, the effect of resolving ambiguities through translation. The beauty of that idea is the use of naturally occurring data instead of artificially created resources and costly manual annotations. The framework is based on deep learning and neural machine translation and my hypothesis is that training on increasing amounts of linguistically diverse data improves the abstractions found by the model. Eventually, this will lead to universal sentence-level meaning representations and we will test our ideas with multilingual machine translation and tasks that require semantic reasoning and inference."
Results, demos, etc. Show all and search (0)
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/771113
Start date: 01-09-2018
End date: 31-03-2024
Total budget - Public funding: 1 817 622,00 Euro - 1 817 622,00 Euro
Cordis data

Original description

"Natural language understanding is the ""holy grail"" of computational linguistics and a long-term goal in research on artificial intelligence. Understanding human communication is difficult due to the various ambiguities in natural languages and the wide range of contextual dependencies required to resolve them. Discovering the semantics behind language input is necessary for proper interpretation in interactive tools, which requires an abstraction from language-specific forms to language-independent meaning representations. With this project, I propose a line of research that will focus on the development of novel data-driven models that can learn such meaning representations from indirect supervision provided by human translations covering a substantial proportion of the linguistic diversity in the world. A guiding principle is cross-lingual grounding, the effect of resolving ambiguities through translation. The beauty of that idea is the use of naturally occurring data instead of artificially created resources and costly manual annotations. The framework is based on deep learning and neural machine translation and my hypothesis is that training on increasing amounts of linguistically diverse data improves the abstractions found by the model. Eventually, this will lead to universal sentence-level meaning representations and we will test our ideas with multilingual machine translation and tasks that require semantic reasoning and inference."

Status

SIGNED

Call topic

ERC-2017-COG

Update Date

27-04-2024
Images
No images available.
Geographical location(s)