SPEAKER DICE | Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods

Summary
The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question “who spoke when?” in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD, which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization, we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian speaker recognition models.

The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the different stages of SD systems.

The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data. The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data mining industry (for example Czech Phonexia or Spanish Agnitio).

The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization. The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit European industry.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/748097
Start date: 01-03-2017
End date: 28-02-2019
Total budget - Public funding: 142 720,80 Euro - 142 720,00 Euro
Cordis data

Original description

The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question “who spoke when?” in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD, which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization, we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian speaker recognition models.

The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the different stages of SD systems.

The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data. The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data mining industry (for example Czech Phonexia or Spanish Agnitio).

The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization. The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit European industry.

Status

CLOSED

Call topic

MSCA-IF-2016

Update Date

28-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.3. EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions (MSCA)
H2020-EU.1.3.2. Nurturing excellence by means of cross-border and cross-sector mobility
H2020-MSCA-IF-2016
MSCA-IF-2016