UTTER | Unified Transcription and Translation for Extended Reality

Summary
The aim of UTTER is to leverage large language models to build the next generation of multimodal eXtended reality (XR) technologies for transcription, translation, summarisation, and minuting. We will make these technologies scalable, adaptable, contextualised, robust, explainable, and emotion-aware. We will increase the context-sensitivity of the technologies, so they can take into account the full history of the conversation, as well as its wider context. We will introduce confidence-aware models, which can take into account their own limitations. We will develop explainable models, so the human user can know why the model made the decisions it did. We will improve adaptation, so that domain-specific and language-specific models can be quickly rolled out. For these advances we will make use of pre-trained eXtended reality (XR) models, which optimally combine text and speech signals, and are trained efficiently with adapters and prompting. We will also develop efficient methods to deploy such large and complex models, so that they can put into production in an energy-efficient manner. Our use-case prototypes will cover (i) A personal assistant for meetings that can improve communication in the online world and (ii) an advanced customer service assistant to support global markets. These prototypes will be developed and tested throughout the project, with annual releases and evaluations. Through our cascaded grant programme, and our release of tools to facilitate the use of pre-trained XR models, will enable the take-up and development of these technologies throughout Europe.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/101070631
Start date: 01-10-2022
End date: 30-09-2025
Total budget - Public funding: 4 074 791,00 Euro - 4 070 321,00 Euro
Cordis data

Original description

The aim of UTTER is to leverage large language models to build the next generation of multimodal eXtended reality (XR) technologies for transcription, translation, summarisation, and minuting. We will make these technologies scalable, adaptable, contextualised, robust, explainable, and emotion-aware. We will increase the context-sensitivity of the technologies, so they can take into account the full history of the conversation, as well as its wider context. We will introduce confidence-aware models, which can take into account their own limitations. We will develop explainable models, so the human user can know why the model made the decisions it did. We will improve adaptation, so that domain-specific and language-specific models can be quickly rolled out. For these advances we will make use of pre-trained eXtended reality (XR) models, which optimally combine text and speech signals, and are trained efficiently with adapters and prompting. We will also develop efficient methods to deploy such large and complex models, so that they can put into production in an energy-efficient manner. Our use-case prototypes will cover (i) A personal assistant for meetings that can improve communication in the online world and (ii) an advanced customer service assistant to support global markets. These prototypes will be developed and tested throughout the project, with annual releases and evaluations. Through our cascaded grant programme, and our release of tools to facilitate the use of pre-trained XR models, will enable the take-up and development of these technologies throughout Europe.

Status

SIGNED

Call topic

HORIZON-CL4-2021-HUMAN-01-13

Update Date

09-02-2023
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon Europe
HORIZON.2 Global Challenges and European Industrial Competitiveness
HORIZON.2.4 Digital, Industry and Space
HORIZON.2.4.6 Next Generation Internet
HORIZON-CL4-2021-HUMAN-01
HORIZON-CL4-2021-HUMAN-01-13 eXtended Reality Modelling (RIA)