ML-TEXTSUM | Multi-language text summarization

Summary
In our daily life, we are submerged by huge amounts of text, coming from different sources such as emails, news, reports, and so on. The availability of unprecedented volumes of data represents both a challenge and an opportunity. On one hand, it can lead to information overload, a phenomenon that limits one’s capacity to understand an issue and act in the presence of too much information. On the other hand, the effective harnessing of this information has undeniable economical potential. Furthermore, In the European context, special needs to be put to multilingualism to guarantee global access to high quality information.

The objective of this application is to develop ML-TEXTSUM, a system for efficient and accurate multi-lingual text summarization. That is, given as input a text document, the system will output a summary of the document in the same or in a different language. Building on recent breakouts in machine learning and natural language processing, I propose a novel architecture for ML-TEXTSUM that will be able to produce high quality summaries while at same time remain modular enough so that new languages can be added with minimal effort. The availability of such system shall allow citizens, regardless of their language, to better handle the information overload and to gain access to critically distilled information (e.g., what is a certain newspaper’s opinion on the same topic this year? Are male/female athletes portrayed differently by the media?).

The project is characterized by the interplay of multiple disciplines: the proposed architecture requires to master a combination of natural language processing and machine learning techniques. At the same time, the formidable scale of this system will require the development of novel distributed optimization methods. This interplay will be achieved thanks to my past and future collaborations, my solid background in optimization and machine learning, as well as through the acquisition of new ad-hoc skills.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/748900
Start date: 01-09-2017
End date: 31-08-2020
Total budget - Public funding: 265 840,20 Euro - 265 840,00 Euro
Cordis data

Original description

In our daily life, we are submerged by huge amounts of text, coming from different sources such as emails, news, reports, and so on. The availability of unprecedented volumes of data represents both a challenge and an opportunity. On one hand, it can lead to information overload, a phenomenon that limits one’s capacity to understand an issue and act in the presence of too much information. On the other hand, the effective harnessing of this information has undeniable economical potential. Furthermore, In the European context, special needs to be put to multilingualism to guarantee global access to high quality information.

The objective of this application is to develop ML-TEXTSUM, a system for efficient and accurate multi-lingual text summarization. That is, given as input a text document, the system will output a summary of the document in the same or in a different language. Building on recent breakouts in machine learning and natural language processing, I propose a novel architecture for ML-TEXTSUM that will be able to produce high quality summaries while at same time remain modular enough so that new languages can be added with minimal effort. The availability of such system shall allow citizens, regardless of their language, to better handle the information overload and to gain access to critically distilled information (e.g., what is a certain newspaper’s opinion on the same topic this year? Are male/female athletes portrayed differently by the media?).

The project is characterized by the interplay of multiple disciplines: the proposed architecture requires to master a combination of natural language processing and machine learning techniques. At the same time, the formidable scale of this system will require the development of novel distributed optimization methods. This interplay will be achieved thanks to my past and future collaborations, my solid background in optimization and machine learning, as well as through the acquisition of new ad-hoc skills.

Status

CLOSED

Call topic

MSCA-IF-2016

Update Date

28-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.3. EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions (MSCA)
H2020-EU.1.3.2. Nurturing excellence by means of cross-border and cross-sector mobility
H2020-MSCA-IF-2016
MSCA-IF-2016