TalkingHeads | TalkingHeads: Audiovisual Speech Recognition in-the-wild

Summary
Audio-visual speech recognition refers to the problem of recognizing speech using both audio and video information. Speech is not a purely auditory process but the way that the listener perceives it is also through the recognition of the visual patterns associated with the mouth movement. This correlation of the audio-visual information has been occasionally explored in literature in order to develop more robust automatic speech recognition systems for cases in which the auditory environment is noisy (e.g. background noise, multiple speakers). However, the problem of audio-visual speech recognition has been mainly studied in controlled, laboratory conditions. TalkingHeads proposes, for the first time, the problem of audio-visual speech recognition in unconstrained (in-the-wild) videos collected from real-world multimedia databases and a set of methodologies that will work well under the assumed in-the-wild setting.

TalkingHeads brings together a talented but experienced researcher (ER) with expertise in speech analysis (diarization and recognition) and the Supervisor with large research experience in Computer Vision for face analysis in-the-wild (recognition, detection, alignment and tracking, and facial expression analysis). TalkingHeads will establish the ER as an independent and internationally recognized researcher in the area of audio-visual fusion and speech recognition. Through TalkingHeads’ achievable work plan, the ER will attain a high level of research maturity by (a) complementing his expertise on speech analysis through extensive training in Computer Vision, (b) conducting research on a challenging research problem (audio-visual speech recognition in-the-wild) with significant career opportunities in both the academia and the industry, (c) publishing at high impact factor conferences and journals, (d) establishing a network of research collaborators, and (e) enhancing personal skills (e.g. supervisory experience, leadership and management skills).
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/706668
Start date: 01-06-2016
End date: 31-05-2018
Total budget - Public funding: 183 454,80 Euro - 183 454,00 Euro
Cordis data

Original description

Audio-visual speech recognition refers to the problem of recognizing speech using both audio and video information. Speech is not a purely auditory process but the way that the listener perceives it is also through the recognition of the visual patterns associated with the mouth movement. This correlation of the audio-visual information has been occasionally explored in literature in order to develop more robust automatic speech recognition systems for cases in which the auditory environment is noisy (e.g. background noise, multiple speakers). However, the problem of audio-visual speech recognition has been mainly studied in controlled, laboratory conditions. TalkingHeads proposes, for the first time, the problem of audio-visual speech recognition in unconstrained (in-the-wild) videos collected from real-world multimedia databases and a set of methodologies that will work well under the assumed in-the-wild setting.

TalkingHeads brings together a talented but experienced researcher (ER) with expertise in speech analysis (diarization and recognition) and the Supervisor with large research experience in Computer Vision for face analysis in-the-wild (recognition, detection, alignment and tracking, and facial expression analysis). TalkingHeads will establish the ER as an independent and internationally recognized researcher in the area of audio-visual fusion and speech recognition. Through TalkingHeads’ achievable work plan, the ER will attain a high level of research maturity by (a) complementing his expertise on speech analysis through extensive training in Computer Vision, (b) conducting research on a challenging research problem (audio-visual speech recognition in-the-wild) with significant career opportunities in both the academia and the industry, (c) publishing at high impact factor conferences and journals, (d) establishing a network of research collaborators, and (e) enhancing personal skills (e.g. supervisory experience, leadership and management skills).

Status

CLOSED

Call topic

MSCA-IF-2015-EF

Update Date

28-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.3. EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions (MSCA)
H2020-EU.1.3.2. Nurturing excellence by means of cross-border and cross-sector mobility
H2020-MSCA-IF-2015
MSCA-IF-2015-EF Marie Skłodowska-Curie Individual Fellowships (IF-EF)